Sort Boxplot by Mean with Seaborn in Python

Sort Boxplot by Mean with Seaborn Python
Sort Boxplot by Mean with Seaborn Python

Sorting a boxplot by mean or median values of each group is more helpful, when you are making a boxplot with multiple groups.

In this post we will see an example of starting with a boxplot that is unordered and we will use Pandas and Seaborn to sort the boxplot by mean (and median).

First, we will see how to sort boxes in boxplot in ascending order and then we will sort the boxplot in descending order using Pandas, NumPy and Seaborn.

Let us first import all the libraries we need to use.

 
import seaborn as sns 
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

Generate data by simulation to make boxplots

We will generate data by simulation and make a dataframe with multiple variables for boxplot. We use NumPy’s random module to generate random numbers for each variable with different mean values.

 
np.random.seed(42)
# Generating Data
df = pd.DataFrame({
    'Morocco': np.random.normal(57, 5,100),
    'USA': np.random.normal(73, 5, 100),
    'Jamaica': np.random.normal(68, 8,100),
    'Sierra Leone': np.random.normal(37, 10, 100),
    'Iceland': np.random.normal(76, 5, 100)
  })

Our data in the Pandas data frame look like this. You can see that the data is in wide form.

 
print(df.head())

   Morocco        USA    Jamaica  Sierra Leone    Iceland
0  59.483571  65.923146  70.862299     28.710050  68.027862
1  56.308678  70.896773  72.486276     31.398190  73.003125
2  60.238443  71.286427  76.664410     44.472936  76.026218
3  64.615149  68.988614  76.430416     43.103703  76.234903
4  55.829233  72.193571  56.978645     36.790984  73.749673

Simple Boxplot with Python

Seaborn can take the Pandas dataframe with data in wide form and make a boxplot. We just need to provide the data frame as input to Seaborn’s boxplot function.

 
# make boxplot with Seaborn's boxplot function
# with data in wide form 
sns.boxplot(data=df)
# set x-axis label
plt.xlabel("Countries", size=18)
# set y-axis label
plt.ylabel("LifeExp", size=18)

Our simple boxplot that is unordered looks like this.

How to Sort a Boxplot by Mean with Seaborn?

How To Sort Boxplots in Ascending Order with Python

Let us first compute mean value for each group using Pandas. We will also sort the mean values to find the indices after sorting.

 
# compute mean per group and find index after sorting
sorted_index = df.mean().sort_values().index

Our sorted index looks like this, in ascending order.

 
sorted_index

Index(['Sierra Leone', 'Morocco', 'Jamaica', 'USA', 'Iceland'], dtype='object')

We can also sort the boxplot by median value instead of mean. In this example sorted order is the same in both the cases.

 
# compute median per group and find index after sorting
sorted_index = df.median().sort_values().index
sorted_index
Index(['Sierra Leone', 'Morocco', 'Jamaica', 'USA', 'Iceland'], dtype='object')

We can use the sorted index to sort the Pandas dataframe. Pandas’s sort_values() function sorts in ascending order by default.

 
df_sorted=df[sorted_index]
	Sierra Leone	Morocco	Jamaica	USA	Iceland
0	28.710050	59.483571	70.862299	65.923146	68.027862
1	31.398190	56.308678	72.486276	70.896773	73.003125
2	44.472936	60.238443	76.664410	71.286427	76.026218
3	43.103703	64.615149	76.430416	68.988614	76.234903
4	36.790984	55.829233	56.978645	72.193571	73.749673

And now we are ready to make a sorted boxoplot with Seaborn.

 
# sorted boxplot with Seaborn's boxplot
sns.boxplot(data=df_sorted)
# set x and y axis labels
plt.xlabel("Countries", size=18)
plt.ylabel("LifeExp", size=18)

Now our boxplot is ordered by each group’s mean/median value. Note that the boxplots are sorted in ascending order.

Boxplot Sorted by Mean/Median with Seaborn Python

How to Sort Boxplot in Descending Order with Pandas, Numpy and Seaborn?

Sometimes you might to sort a boxplot by its mean or median but in descending order. The key step to get boxplots in descending order is to get the sorted index in descending order.

We can do that by changing the default ascending=True to asending=False while using Pandas sort_values() to sort the mean or median values.

 
# compute mean per group and find index after sorting in descending order
sorted_index_desc = df.mean().sort_values(ascending=False).index
# We can also use existing index and 
# flip the order with NumPy
#sorted_index_desc = np.flip(sorted_index)

Now that we have sorted the groups in descending order, let us use it and sort the Pandas dataframe.

 
df_sorted_desc=df[sorted_index_desc]
df_sorted_desc.head()

We have the dataframe in the right descending order we wanted. Now we can use Seaborn’s boxplot() function as before to make boxplot.

 
# make boxplots sorted in descending order with Seaborn's boxplot()
sns.boxplot(data=df_sorted_desc)
plt.xlabel("Countries", size=18)
plt.ylabel("LifeExp", size=18)

Our boxplot is sorted by mean in desending order as we wanted.

Sort Boxplot in Descending Order Python

To summarize, In this beginner level tutorial, we saw examples of how to sort a boxplot made with Seaborn’s function boxplot() by either its mean or median. We used NumPy, Pandas, and Seaborn to sort the boxplot both in ascending and descending order.

Exit mobile version