Scatter Plot with Regression Line using Altair in Python

Scatter plot with regression line using Altair
Altair Scatter plot with regression line

Adding regression line to scatter plot is a great way to understand the relationship between two numeric variables.

In this post, we will see an example of using Altair to make a scatter plot with regression line using real world dataset.

Let us load the packages we need. We will load Altair package and load data sets from vege_datasets.

import altair as alt
from vega_datasets import data

We will use the Seattle weather data set to make scatterplot with a linear regression line.

seattle_weather = data.seattle_weather()

Here is how the data looks like.

print(seattle_weather.head(n=3))

        date  precipitation  temp_max  temp_min  wind  weather
0 2012-01-01            0.0      12.8       5.0   4.7  drizzle
1 2012-01-02           10.9      10.6       2.8   4.5     rain
2 2012-01-03            0.8      11.7       7.2   2.3     rain

The basic idea behind making scatter plot with a regression line in Altair is to create scatter plot first using a base Altair object and then add regression line on top of it.

Let us first make a scatter plot with Altair and save it as an object.

sc_plot = alt.Chart(seattle_weather).mark_point().encode(
    x='temp_max',
    y='temp_min'
)

We can make regression line using transfor_regression() function and we can add it as another layer to the scatter plot. We need to specify which variables in our dataframe needed to be used to do the regression analysis and make regression line.

sc_plot + sc_plot.transform_regression('temp_max', 'temp_min').mark_line()

By default, Altair adds a blue regression line to the scatter plot as show below.

Scatter plot with regression line Altair
Exit mobile version