Multiple Density Plots with Pandas in Python

Multiple Density Plots with Pandas
Multiple Density Plots with Pandas

Often you may have data belonging to multiple groups. Visualizing them as multiple density plot is a great way to understand the similarities and differences between the groups.

In this tutorial, we will learn how to make multi-density plot using Pandas in Python. We will use developer salary from US (from Stack Overflow survey) with different educational qualification to make multiple density plots using Pandas.

Let us first load the processed data from Stack Overflow survey. We have the processed data at datavizpyr.com’s github.

# salary data derived from https://datavizpyr.com/density-plots-with-pandas-in-python/
stackoverflow_salary_file = "https://raw.githubusercontent.com/datavizpyr/data/master/SO_data_2019/2019_Stack_Overflow_Survey_Education_Salary_US.tsv"
# load the salary data 
salary = pd.read_csv(stackoverflow_salary_file, sep="\t")
salary.head()

	CompTotal	Education
0	180000.0	Master's
1	55000.0	Bachelor's
2	77000.0	Bachelor's
3	67017.0	Bachelor's
4	90000.0	Less than bachelor's

By visualizing the distribution of developer salary with different levels of education as multi-density plot, we can understand the effect of degrees on developer salary in US.

We can make multiple density plots with Pandas’ plot.density() function. Check here for making simple density plot using Pandas.

However, the density() function in Pandas needs the data in wide form, i.e. each group’s values in their own columns.

We can reshape the dataframe in long form to wide form using pivot() function.

salary_wide=salary.pivot(columns='Education',values='CompTotal')

Now we have our data in right form to make multiple density plots using Pandas.

salary_wide.head()

Education	Bachelor's	Less than bachelor's	Master's	PhD	Professional
0	NaN	NaN	180000.0	NaN	NaN
1	55000.0	NaN	NaN	NaN	NaN
2	77000.0	NaN	NaN	NaN	NaN
3	67017.0	NaN	NaN	NaN	NaN
4	NaN	90000.0	NaN	NaN	NaN

How To Make Multiple Density Plots with Pandas?

We can use salary data in wide form and use plot.density() function on it to make multiple density plots. Pandas plot.density() function will make density plots of all the variables in the wide dataframe. In this case we have five groups and we will have five density plots on the same plot.

salary_wide.plot.density(figsize=(8,6),xlim=(5000,1e6),linewidth=4)
plt.savefig("multiple_density_plots_with_Pandas_Python.jpg")

In this density plot, we specify x-axis limits to focus on reasonable x-axis values. Note, Pandas knows to color each density plot differently. Also, Pandas nicely assigns labels for each density plot.

Multiple Density Plots with Pandas

As we saw before, we have long tail for the density plot and we can use log-scale on x-axis with multiple density plot to make the plot look better. We can make the x-axis scale to log-scale with logx=True argument inside density() function.

salary_wide.plot.density(figsize=(8,6),
                         logx=True,
                         xlim=(5000,1e6),
                         linewidth=4, 
                         fontsize=14)
plt.xlabel("Salary in US", size=14)
plt.savefig("Multiple_density_plots_with_log_scale_Pandas_Python.jpg")

With log-scale on multiple density plot, we can clearly see the effect of education on deeloper’s salary. We can see that on an average, developers with PhD make more money than others; closely followed by developers with Master’s degree.

Multiple Density Plots with Pandas
Exit mobile version