How to Make a Time Series Plot with Rolling Average in Python?

Time Series Plot with 7-day rolling average
Time Series Plot with 7-day rolling average: Pandas and Seaborn

Time Series plots are a great way to see a trend over a period of time. However, if the numerical variable that we are plotting in time series plot fluctuates day to day, it is often better to add a layer moving average to the time series plot.

In this post, we will see examples of making time series plot first and then add 7-day average time series plot. We will use COVID19 dataset from covidtracking.com. We will use Seaborn’s lineplot to make the time series plot and Pandas’ rolling() function to compute 7-day rolling average of new cases per day.

Let us load Pandas and load the COVID19 daily cases in US. Here we specify which variable/column needs to be loaded as date variable using “parse_dates” argument.

import pandas as pd
data_url ="http://covidtracking.com/api/states/daily.csv"
corona= pd.read_csv(data_url, parse_dates=['date'])
corona.head()

Let us make daily cases time series plot for the state of New York. We will filter the COVID data to get the data for NY state.

corona_ny = corona.query("state=='NY'")
corona_ny.head()

We are mainly interested in just two of the variables in the data; date and daily new cases in NY.

corona_ny[['date', 'positiveIncrease']].head()
	date	positiveIncrease
37	2020-05-15	2762.0
93	2020-05-14	2390.0
149	2020-05-13	2176.0
205	2020-05-12	1430.0
261	2020-05-11	1660.0

Simple Time Series Plot with Seaborn’s lineplot()

Let us make a simple time series plot between date and daily new cases. We can use Seaborn’s lineplot() function to make the time series plot. In addition to making a simple line plot, we also by customize axis labels and figure size to save the plot as PNG file.

# bigger plot elements suitable for giving talks
sns.set_context("talk")
# set figure size
plt.figure(figsize=(9,6))
# Time series plot with Seaborn lineplot()
sns.lineplot(x="date", y="positiveIncrease", 
            data=corona_ny, ci=None)
# axis labels
plt.xlabel("Date", size=14)
plt.ylabel("Daily New Cases", size=14)
# save image as PNG file
plt.savefig("Time_Series_Plot_with_Seaborn.png",
                    format='png',
                    dpi=150)

We get a time series plot from lineplot(). It is easy to see that the number of new cases per day fluctuates a lot. Typically higher during weekdays and lower during weekends.

Time Series Plot with Seaborn Lineplot

A better way to visualize is to make a timeseries plot with rolling average or moving average of certain window size. In the example below we make timeseries plot with 7-day rolling average of new cases per day.

For that we need to first compute the rolling average for the new cases per day. Depending on the window size we pick, we will have NAs at the ends.

Computing 7-day rolling average with Pandas rolling()

In Pandas, we can compute rolling average of specific window size using rolling() function followed by mean() function. Here we also perform shift operation to shift the NA values to both ends.

corona_ny['cases_7day_ave'] = corona_ny.positiveIncrease.rolling(7).mean().shift(-3)

Now we have created new variable for 7-day average. Note that because of the shift() function, the first 3 and last 3 values of 7-day average is NaN.

corona_ny[['date', 'positiveIncrease','cases_7day_ave']].head()
date	positiveIncrease	cases_7day_ave
37	2020-05-15	2762.0	NaN
93	2020-05-14	2390.0	NaN
149	2020-05-13	2176.0	NaN
205	2020-05-12	1430.0	2200.857143
261	2020-05-11	1660.0	2200.285714

We are now ready to make time series plot with actual new cases per day and its 7-day average. To do that we first make times series plot as before and then use lineplot again with the 7-day average time series plot.

# bigger plot elements suitable for giving talks
sns.set_context("talk")
# set figure size
plt.figure(figsize=(9,6))
# Time series plot with Seaborn lineplot() with label
sns.lineplot(x="date",y="positiveIncrease",
             label="Daily", data=corona_ny,
             ci=None)
# 7-day rolling average Time series plot with Seaborn lineplot() with label
sns.lineplot(x="date",y="cases_7day_ave",
             label="7-day Ave",
             data=corona_ny,
             ci=None)
# set axis labels
plt.xlabel("Date", size=14)
plt.ylabel("Daily New Cases", size=14)
# save image as PNG file
plt.savefig("Time_Series_Plot_with_7day_average_Seaborn.png",
                    format='png',
                    dpi=150)

We can clearly see how the 7-day average case time series has smoothed the variation in case counts.

Time Series Plot with 7-day rolling average: Pandas and Seaborn
Exit mobile version