How to Make Violin plots with Matplotlib

In this tutorial, we will learn how to make violin plots using Python’s Matplotlib library. Matplotlib has a function called violinplot() and we will use that function to examples of making violinplot first and then learn to customize the violinplots.

Unlike other tutorials on violinplot with Matplotlib, here we start with data stored in a Pandas dataframe and show the distribution of multiple groups as violin plots.

Let us get started by loading Matplotlib and other needed packages.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

We us Palmer penguin dataset to make violinplot and the data is available from datavizpyr.com’s github page.

penguins_data="https://raw.githubusercontent.com/datavizpyr/data/master/palmer_penguin_species.tsv"
# load penguns data with Pandas read_csv
df = pd.read_csv(penguins_data, sep="\t")
# remove rows with missing values
df = df.dropna()
df.head()

species	island	culmen_length_mm	culmen_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	MALE
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	FEMALE
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	FEMALE
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	FEMALE
5	Adelie	Torgersen	39.3	20.6	190.0	3650.0	MALE

We will be making violin plot body mass for different penguin species. To get the the body mass data for three species in a list, we use group by on species and aggregate function on body mass variable

data = (df.
        groupby('species')["body_mass_g"].
        agg(lambda x: list(x)))

Our data for violin plot looks like this

data

species
Adelie       [3750.0, 3800.0, 3250.0, 3450.0, 3650.0, 3625....
Chinstrap    [3500.0, 3900.0, 3650.0, 3525.0, 3725.0, 3950....
Gentoo       [4500.0, 5700.0, 4450.0, 5700.0, 5400.0, 4550....
Name: body_mass_g, dtype: object

We can get the names from the index.

data.index

Index(['Adelie', 'Chinstrap', 'Gentoo'], dtype='object', name='species')

Default Violinplot with Matplotlib

Let us make violin plot using Matplotlib’s violinplot() function. By default, Matplotlib’s violin plot adds numbers on x-axis tick. Here set the x-axis tick using set_xticks() function with species names as labels. We also add axis labels and title to the violinplot.

labels = data.index
fig, ax = plt.subplots()
# make violinplot
ax.violinplot(data)
# set x-axis tick labels
ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)
plt.xlabel("Species",
            size=14)
plt.ylabel("Body Mass (g)", 
            size=14)
plt.title("Violinplot with Palmer Penguin Data", 
            size=16)
plt.savefig("Violinplot_matplotlib_python.png",
                    format='png',dpi=150)

Violinplot made with Matplotlib looks like this with blue color fill and line showing the extreme values.

Default Violin Plot with Matplotlib
Default Violin Plot with Matplotlib

Add Median value to Violinplot with Matplotlib

To customize the violinplot further, let us add median values as a point to the violinplot. To do that let us first compute median values for each group.

medians = (df.
           groupby('species')["body_mass_g"].
           median())

First, we make violinplot as before. And then we add the median values to the violinplot by using scatter() function from Matplotlib.

inds = np.arange(1, len(medians) + 1)
labels = data.index

fig, ax = plt.subplots()
ax.violinplot(data, 
             showextrema=False)
ax.scatter(inds, medians, marker='o', color='red', s=30, zorder=3)
ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)
plt.xlabel("Species",
            size=14)
plt.ylabel("Body Mass (g)", 
            size=14)
plt.title("Violinplot with Palmer Penguin Data", 
            size=16)
plt.savefig("Customizing_Violinplot_matplotlib_python.png",
                    format='png',dpi=150)

We have median values as red point on the violin plot. In addition, we have also specified not to show the extreme values as lines. Everything else is the same as before.

Customizing Violin Plot : Annotating Median – Example 1

Add lower and upper quartile ranges to Violinplot with Matplotlib

It will be great to further customize the violinplot by adding lower and upper quartile values on the violinplot. To do that, we will first compute them using quantile() function in Pandas.

quartile1 = (df.
             groupby('species')["body_mass_g"].
             quantile(0.25))
quartile1

species
Adelie       3362.5
Chinstrap    3487.5
Gentoo       4700.0
Name: body_mass_g, dtype: float64
quartile3 = (df.
             groupby('species')["body_mass_g"].
             quantile(0.75))
quartile3

species
Adelie       4000.0
Chinstrap    3950.0
Gentoo       5500.0
Name: body_mass_g, dtype: float64

Now in addition to violinplot and the median values, we add vertical lines to the quartile ranges.

inds = np.arange(1, len(medians) + 1)
fig, ax = plt.subplots()
ax.violinplot(data, 
              #showmeans=True,
              #showmedians=True,
             showextrema=True)
# set style for the axes
labels = data.index
# add median value as a point
ax.scatter(inds, medians, marker='o', color='red', s=40, zorder=3)
# Add boxplot-like vertical lines to show the first and third quartile
ax.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=6)
ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)
plt.xlabel("Species",
            fontweight ='bold', 
            size=14)
plt.ylabel("Body Mass (g)", 
            fontweight ='bold',
            size=14)
plt.title("Violinplot with Palmer Penguin Data", 
            fontweight ='bold',
            size=16)
#plt.show()
plt.savefig("Customizing_Violinplot_matplotlib_python_2.png",
                    format='png',dpi=150)

And this gives a nice boxplot-like look showing the quartile values.

Customizing Violinplot with Matplotlib

Exit mobile version