Boxplots show five summary statistics, including median, derived from data to show distribution of numerical data corresponding categorical variables. Sometimes, you might want to highlight the mean values in addition to the five statistics of boxplot.
In this post we will see how to show mean mark on boxplot using Seaborn in Python. We will first make a simple boxplot using Seaborn’s boxplot function and show how to add mean values on box using Seaborn. And then we will use Matplotlib to customize the way mean mark looks on the boxplot.
Let us load Pandas, Seaborn and Matplotlib.
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
We will use Stack Overflow 2019 survey data to visualize the salary distributions across different educational qualifications. Let us load the processed data from datatvizpyr.com‘s github page.
data_url ="https://raw.githubusercontent.com/datavizpyr/data/master/SO_data_2019/StackOverflow_survey_filtered_subsampled_2019.csv" data = pd.read_csv(data_url) print(data.head(3))
Let us preprocess the data filter out outliers
data=data.query('Manager=="IC"') data=data.query('CompTotal<300000 & CompTotal>30000')
Now we are ready to make boxplots and highllight mean values on the boxplot. We will start with making a simple boxplot using Seaborn’s boxplot function.
plt.figure(figsize=(10, 8)) sns.boxplot(x="Education", y="CompTotal", data=data) plt.ylabel("Salary in US Dollars", size=14) plt.xlabel("Education", size=14) plt.title("StackOverflow Survey Data: Effect of Education on Salary", size=18) plt.savefig("simple_boxplot_with_Seaborn_boxplot_Python.png")
We get a nice boxplot automatically filled with colors by Seaborn. We can see the median values as line in the box.
How to Show mean marks on Boxplot with Seaborn?
With Seaborn’s boxplot() function, we can add a mark for mean values on the boxplot, using the argument “showmeans=True”.
# figure size plt.figure(figsize=(10, 8)) # make boxplot with Seaborn with means # using showmeans=True sns.boxplot(x="Education", y="CompTotal", data=data, showmeans=True) plt.ylabel("Salary in US Dollars", size=14) plt.xlabel("Education", size=14) plt.title("Boxplot with Seaborn Showing mean marks", size=18) plt.savefig("show_means_in_boxplot_Seaborn_boxplot_Python.png")
Seaborn’s showmeans=True argument adds a mark for mean values in each box. By default, mean values are marked in green color triangles.
How to Customize mean marks on Boxplot with meanprops in Matplotlib?
Although we have highlighted the mean values on the boxplot, the color choice for mean value does not match well with boxplot colors. It will be great to customize the mean value symbol and color on the boxplot.
We can use Matplotlib’s meanprops to customize anything related to the mean mark that we added. For example, we can change the shape using “marker” argument. With “markerfacecolor” and “markeredgecolor”, we can change the fill color and edge color of the marker. And finally we can change the size of the mean marker with “markersize” option.
plt.figure(figsize=(10, 8)) sns.boxplot(x="Education", y="CompTotal", data=data, showmeans=True, meanprops={"marker":"o", "markerfacecolor":"white", "markeredgecolor":"black", "markersize":"10"}) plt.ylabel("Salary in US Dollars", size=14) plt.xlabel("Education", size=14) plt.title("Customizing Mean Marks in Boxplot with Seaborn", size=18) plt.savefig("customize_mean_mark_in_boxplot_Seaborn_boxplot_Python.png")
Now we have customized the mean marker nicely with white color circles on boxplot.