When you have a multiple groups and subgroups within each groups with associated numerical values, we can use Grouped boxplots to visualize. With Seaborn we can make grouped boxplots using boxplot() function and much newer function Catplot(). Seaborn Catplot is a function that unifies multiple data visualization techniques, including boxplots, when you have a numerical variable and one or more categorical variables.
Let us import Seaborn, Pandas and Matplotlib to make grouped boxplot using Catplot.
import seaborn as sns import matplotlib.pyplot as plt import pandas as pd
We will make grouped boxplot using stocks dataset available from vega_datasets. The stocks data contains stock prices for the top tech companies: IBM,Apple, Microsoft, Google, and Amazon for years 2000 to 2010.
from vega_datasets import data stocks = data.stocks() stocks.head() symbol date price 0 MSFT 2000-01-01 39.81 1 MSFT 2000-02-01 36.35 2 MSFT 2000-03-01 43.22 3 MSFT 2000-04-01 28.37 4 MSFT 2000-05-01 25.45
Let us create year variable from the date column. In Pandas we can first convert the date column to DatatimeIndex variable and then use year accessor to get the year from date variable We will use the year variable in our grouped boxplots with Catplot.
stocks['year']=pd.DatetimeIndex(stocks['date']).year stocks.head() symbol date price year 0 MSFT 2000-01-01 39.81 2000 1 MSFT 2000-02-01 36.35 2000 2 MSFT 2000-03-01 43.22 2000 3 MSFT 2000-04-01 28.37 2000 4 MSFT 2000-05-01 25.45 2000
Just for the sake of simplicity, we will filter the stocks data to contain stock prices for the year 2007, 2008 and 2009.
stocks_df = stocks.query('year>=2007 & year<=2009')
Let use first start with making a simple boxplot using Catplot in Seaborn. To make boxplot with Seaborn’s Catplot, we need to use kind=”boxplot” argument,
sns.catplot(x='symbol', y='price', data=stocks_df, kind="box", height=6, aspect=1.3);
Grouped Boxplot with Seaborn Catplot
We have a simple data set with one numerical variable; stock price, and two categorical variables tech company and year to make grouped boxplot using Seaborn’s Catplot.
To make grouped boxplot using Catplot, we need to provide which variables should be on x and y first. The variable on x-axis is a categorical variable and variable on y is a numerical variable. We need to specify kind=”boxplot” to tell Catplot that we need to make boxplots. In addition to x and y-axis variable, we need to specify the parameter hue with the second categorical variable. Seaborn’s catplot will use the hue variable to split the boxplot for each group on x-axis and make grouped boxplots.
sns.set(font_scale = 1.5) sns.set_style("white") sns.catplot(x='symbol', y='price', hue="year", data=stocks_df, kind="box", height=6, aspect=1.3); plt.savefig("grouped_boxplot_Seaborn_Catplot_Python.png")
Customizing Grouped Boxplot with Seaborn Catplot
In this example for making grouped boxplot, we have customized the grouped boxplot in a few ways. We first set font size with Seaborn’s set() function and set style using Seaborn’s set_style() function. In addition to save the grouped boxplot as png file, we have also specified the size of the plot using height and aspect arguments inside catplot().