Scatter plots are great way to visualize two quantitative variables and their relationships. Often we can add additional variables on the scatter plot by using color, shape and size of the data points.
With Seaborn in Python, we can make scatter plots in multiple ways, like lmplot(), regplot(), and scatterplot() functions. In this tutorial, we will use Seaborn’s scatterplot() function to make scatter plots in Python. Seaborn’s scatterplot() function is relatively new and is available from Seaborn version v0.9.0 (July 2018). One of the benefits of using scatterplot() function is that one can easily overlay three additional variables on the scatterplot by modifying color with “hue”, size with “size”, and shape with “style” arguments.
Let us load the packages we need.
import matplotlib.pyplot as plt import pandas as pd import seaborn as sns
We will learn to make scatter plots using the wonderful new dataset on Penguins from Palmer station. It is a great dataset to teach data exploration and data visualization. The dataset contains body measurements of three Penguin species.
Penguin Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER.
And Thanks to Alison Horst for making the data easily available.
We will load the simplified data directly from github page.
penguins_data="https://raw.githubusercontent.com/datavizpyr/data/master/palmer_penguin_species.tsv" penguins_df = pd.read_csv(penguins_data, sep="\t")
We can see that we four numerical variables corresponding to three Penguin species. Check the github page for nice illustrations of the body measurements.
penguins_df.head()
species island culmen_length_mm culmen_depth_mm flipper_length_mm body_mass_g sex 0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE 1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE 2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE 3 Adelie Torgersen NaN NaN NaN NaN NaN 4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
Let us get started. In this tutorial, we will learn 9 tips to make publication quality scatter plot with Python. We will start with how to make a simple scatter plot using Seaborn’s scatterplot() function. And then we will use the features of scatterplot() function and improve and make the scatter plot better in multiple steps.
1. How To Make Simple Scatter Plot with Seaborn’s scatterplot()?
Let us get started making scatter plots with Penguin data using Seaborn’s scatterplot() function. First, we will make a simple scatter plot between two numerical varialbles from the dataset,culmen_length_mm and filpper_length_mm.
We can use Seaborn’s scatterplot() specifying the x and y-axis variables with the data as shown below.
sns.scatterplot(x="culmen_length_mm", y="flipper_length_mm", data=penguins_df)
And we get a simple scatter plot like this below.
2. How To Increase Figure Size with Matplotlib in Python?
A look at the scatter plot suggests we can improve the simple version a lot. By default, Seaborn creates a plot of certain size. We might want to increase the figure size and make the plot easier to look at. To increase the figure size, we can use Matplotlib’s figure() function and specify the dimension we want.
# specify figure size with Matplotlib plt.figure(figsize=(10,8)) sns.scatterplot(x="culmen_length_mm", y="flipper_length_mm", data=penguins_df)
In the example here, we have specified the figure size with figsize=(10,8). We get a bigger scatter plot figure.
3. How To Increase Axes Tick Labels in Seaborn?
Although we have increased the figure size, axis tick labels are tiny and not easy to read. We can increase Axes tick labels using Seaborn’s plotting_context() function. In this example, we use plotting_context() function with the arguments ‘”notebook”,font_scale=1.5’.
# specify figure size with Matplotlib plt.figure(figsize=(10,8)) # Increase axis tick label with plotting_context in Seaborn with sns.plotting_context("notebook",font_scale=1.5): sns.scatterplot(x="culmen_length_mm", y="flipper_length_mm", data=penguins_df)
Now we have a better looking scatter plot between Penguin’s Culmen length and Flipper Length with easily readable axis tick labels.
4. How To Change Marker Size in Seaborn Scatterplot?
Before changing the marker size, let us set the axis tick label size for all the plots in the notebook/script. Earlier we used “with” statement to set plotting_context for a single scatter plot.
# Set common plotting_context for all the plots # in the script/notebook sns.set_context("notebook", font_scale=1.5)
We can increase the marker size or the data point size in the scatter plot using the argument “s” in Seaborn’s scatterplot() function.
# set figure size plt.figure(figsize=(10,8)) # change marker size with s=100 in # Seaborn scatterplot() sns.scatterplot(x="culmen_length_mm", y="flipper_length_mm", s=100, data=penguins_df) plt.savefig("How_To_Change_Marker_Size_Seaborn_ScatterPlot.png", format='png')
Now the data points on the scatter plot is bigger and clearly visible.
5. How To Change Axis Labels and Size with Matplotlib for Seaborn Scatterplot?
Notice that, our x and y axis labels are the same names as in Penguin’s data frame. We can change the axis labels and their sizes using Matplotlib.
We use Matplotlibs’ xlabel() and ylabel() functions to change the labels and increase their font sizes.
plt.figure(figsize=(10,8)) sns.scatterplot(x="culmen_length_mm", y="flipper_length_mm", s=100, data=penguins_df) # set x-label plt.xlabel("Culmen Length (mm)", size=24) plt.ylabel("Flipper Length (mm)", size=24) plt.savefig("Customize_Axis_Labels_Scatter_Plot_Penguins_data_Seaborn.png", format='png')
We have customized the x-axis and y-axis labels and also increased label’s font sizes.
6. How To Color Scatter Plot by a Variable with Seaborn’s scatterplot()?
We can change the colors of data points on the scatter plot by a variable in the dataframe using “hue” argument in Seaborn’s scatterplot() function. In this example, we have colored the data points by the “species” variable using hue=”species”.
plt.figure(figsize=(10,8)) sns.scatterplot(x="culmen_length_mm", y="flipper_length_mm", s=100, hue="species", data=penguins_df) plt.xlabel("Culmen Length (mm)") plt.ylabel("Flipper Length (mm)") plt.savefig("Color_scatterplot_by_variable_with_hue_Seaborn_scatterplot.png", format='png',dpi=150)
By coloring data points by a variable in the scatter plot, we have added third variable to the plot. Seaborn automatically represents the third variable with the legend describing colors to the plot.
7. How To Change Shape by a Variable in Scatter Plot with Seaborn’s scatterplot()?
In Seaborn’s scatterplot() function, we can change the shape of markers by a variable using style argument.
plt.figure(figsize=(10,8)) sns.scatterplot(x="culmen_length_mm", y="flipper_length_mm", s=100, style="sex", data=penguins_df) plt.xlabel("Culmen Length (mm)") plt.ylabel("Flipper Length (mm)") plt.savefig("Add_shape_scatterplot_by_variable_with_hue_Seaborn_scatterplot.png", format='png',dpi=150)
In this example, we have changed the marker’s shape based on the value of the variable, “sex” in the dataframe. Notice that data points corresponding to males are different from females.
8. How To Change Color and Shape in Scatter Plot by Two Variables in Seaborn’s scatterplot()?
One of the advantages of Seaborn’s scatterplot function is that we can easily combine hue and style to color data points by one variable and change marker’s shape based on another variable. This way we are displaying four variables in a single scatter plot.
plt.figure(figsize=(10,8)) sns.scatterplot(x="culmen_length_mm", y="flipper_length_mm", s=100, hue="species", style="sex", data=penguins_df) plt.xlabel("Culmen Length (mm)") plt.ylabel("Flipper Length (mm)") plt.savefig("Color_and_shape_by_variable_Seaborn_scatterplot.png", format='png',dpi=150)
We have colored data points by Penguin species and changed marker shapes by penguin’s sex. This enables us to visualize the relationship between culmen length and flipper length with respect to species and sex.
9. How to Change Color, Shape and Size By Three Variables in Seaborn’s scatterplot()
With Seaborn’s scatterplot we can change Color, Shape and Size by three variables using the arguments hue, style, and size.
plt.figure(figsize=(12,10)) sns.scatterplot(x="culmen_length_mm", y="flipper_length_mm", size="body_mass_g", hue="species", style="sex", data=penguins_df) plt.xlabel("Culmen Length (mm)") plt.ylabel("Flipper Length (mm)") plt.savefig("Change_Size_Color_Shape_by_three_variables_Seaborn_scatterplot.png", format='png',dpi=150)
In this example, we have added body mass using size for the third variable to highlight in the scatterplot. Adding size as variable, we have made the simple scatter plot into a bubble plot.
Although the ability to add three variables is nice, it can also affect the easy interpretability of the plots. There are better ways to show multiple variables.