• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Data Viz with Python and R

Learn to Make Plots in Python and R

  • Home
  • Python Viz
  • Seaborn
  • Altair
  • R Viz
  • ggplot2
  • About
    • Privacy Policy
  • Show Search
Hide Search

How To Make Scatter Plot with Regression Line with ggplot2 in R?

datavizpyr · May 4, 2020 ·

Adding regression line to scatter plot can help reveal the relationship or association between the two numerical variables in the scatter plot. With ggplot2, we can add regression line using geom_smooth() function as another layer to scatter plot. In this post, we will see examples of adding regression lines to scatterplot using ggplot2 in R.

Let us load tidyverse suite of packages.

library(tidyverse)

We will use the Broadway data set from TidyTuesday project.

grosses < readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-04-28/grosses.csv')

Our data looks like this with weekly gross for each Broadway play since 1985.

head(grosses)
## # A tibble: 6 x 14
##   week_ending week_number weekly_gross_ov… show  theatre weekly_gross
##   <date>            <dbl>            <dbl> <chr> <chr>          <dbl>
## 1 1985-06-09            1          3915937 42nd… St. Ja…       282368
## 2 1985-06-09            1          3915937 A Ch… Sam S.…       222584
...
## # … with 8 more variables: potential_gross <lgl>, avg_ticket_price <dbl>,
## #   top_ticket_price <lgl>, seats_sold <dbl>, seats_in_theatre <dbl>,
## #   pct_capacity <dbl>, performances <dbl>, previews <dbl>

Let us compute total weekly gross for all plays in a week and the total number of seats sold for all plays in the week. We will first group by week and then use summarize to compute total gross and seats sold.

df <- grosses %>%
  group_by(week_ending)%>%
  summarize(gross=sum(weekly_gross), seats=sum(seats_sold)) 

Our summarized dataframe looks like this and we will use this data to make scatter plot with regression line.

head(df)

## # A tibble: 6 x 3
##   week_ending   gross  seats
##   <date>        <dbl>  <dbl>
## 1 1985-06-09  3915937 132214
## 2 1985-06-16  3685742 127655
## 3 1985-06-23  3690242 124925
## 4 1985-06-30  3986642 131832
## 5 1985-07-07  2929052 103784
## 6 1985-07-14  3072770 108076

Let us first make a simple scatter plot between the seats sold and the total gross

df %>%
  ggplot(aes(x=seats,y=gross)) +
  geom_point(alpha=0.5) +
  labs(x= "Seats Sold", y="Weekly Gross")

We can see association between the two variables gross and seats; gross increasing with total seats sold.

Scatter Plot with ggplot2 in R
Scatter Plot with ggplot2 in R

Let us add regression line to the scatter plot using geom_smooth() function by adding it as one more layer to ggplot2 plot. Here we have just added geom_smooth() to scatter plot function.

df %>%
  ggplot(aes(x=seats,y=gross)) +
  geom_point(alpha=0.5) +
  labs(x= "Seats Sold", y="Weekly Gross")+
  geom_smooth()

By default, geom_smooth() function adds regression line using one of the methods available “lm”, “lowess”, and “gam”. geom_smooth() chooses a smoothing method based ont he size of the data. For example, geom_smooth() uses loess() if the data size is less 1,000 observations. Our data contains more than 1000 observation, so it uses gam() by default. It also adds standard error to the smoothed line on the scatter plot.

Scatter Plot with gam Regression line with  ggplot2 in R
Scatter Plot with gam Regression line with ggplot2 in R

We can make the scatterplot with regression line but not with standard error using se=FALSE as argument to geom_smooth() function.

df %>%
  ggplot(aes(x=seats,y=gross)) +
  geom_point(alpha=0.5) +
  labs(x= "Seats Sold", y="Weekly Gross")+
  geom_smooth(se=FALSE)
Scatter Plot with geom_smooth ggplot2 in R
Scatter Plot with geom_smooth ggplot2 in R

In the above scatterplots we have the regression line from GAM model. We can specify the method for adding regression line using method argument to geom_smooth(). For example, we can add a line from simple linear regression model using “method=lm” argument.

df %>%
  ggplot(aes(x=seats,y=gross)) +
  geom_point(alpha=0.5) +
  labs(x= "Seats Sold", y="Weekly Gross")+
  geom_smooth(method=lm)

For this data it is clear that “method=lm” is not the best fit line.

Scatter Plot with Regression Line ggplot2 in R
Scatter Plot with Regression Line ggplot2 in R

Related posts:

How to Make Heatmap with ggplot2?How To Make Simple Heatmaps with ggplot2 in R? Stacked Barplots Side By Side with ggplot2 in RHow to Make Horizontal Stacked Barplots with ggplot2 in R? Scatter Plot R: Fill color by variableHow To Color Scatter Plot by Variable in R with ggplot2? How to Align Title Position in ggplot2?How To Adjust Title Position in ggplot2 ?

Filed Under: ggplot2 geom_smooth(), R, scatter plot with regression line Tagged With: ggplot2, R

Primary Sidebar

Tags

Altair barplot Boxplot boxplot python boxplot with jiitered text labels Bubble Plot Color Palette Countplot Density Plot Facet Plot gganimate ggplot2 ggplot2 Boxplot ggplot2 error ggplot boxplot ggridges ggtext element_markdown() Grouped Barplot R heatmap heatmaps Histogram Histograms Horizontal boxplot Python lollipop plot Maps Matplotlib Pandas patchwork pheatmap Pyhon Python R RColorBrewer reorder boxplot ggplot Ridgeline plot Scatter Plot Scatter Plot Altair Seaborn Seaborn Boxplot Stock Price Over Time Stripplot UpSetR Violinplot Violin Plot World Map ggplot2

Buy Me a Coffee

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version