• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Data Viz with Python and R

Learn to Make Plots in Python and R

  • Home
  • Python Viz
  • Seaborn
  • Altair
  • R Viz
  • ggplot2
  • About
    • Privacy Policy
  • Show Search
Hide Search

A mistake to Avoid while making boxplot with datapoints in ggplot2

datavizpyr · November 17, 2020 ·

Making a boxplot with data points on top of the boxplot is a great way to show distributions of multiple groups. A big advantage is that one can see the raw data and the summary stats of distributions using boxplot with data points.

ggplot2 in R makes it easy to make boxplots and add data points on top of it. However, one typically makes a small mistake while making boxplots with data points in a naive way .

In this post, we will see an example of what that mistake is and then show a way to avoid making the mistake.

Let us first load tidyverse and set ggplot2 theme for making boxplot.

library(tidyverse)
theme_set(theme_bw(16))

We will be using mobile subscription growth data over the years across the work from tidytuesday project.

mobile <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-11-10/mobile.csv')

The mobile growth data contains countries and their mobile subscription over time.

## Parsed with column specification:
## cols(
##   entity = col_character(),
##   code = col_character(),
##   year = col_double(),
##   total_pop = col_double(),
##   gdp_per_cap = col_double(),
##   mobile_subs = col_double(),
##   continent = col_character()
## )

Let us make a boxplot to see the how each continent fared with respect to mobile users. First, we will specify the the variables using aes() function and then add geom_boxplot() to make boxplot. In out example, continent is x-axis and mobile users is on y-axis. We also color the boxplot by continent. To show the data points with jitter, we add geom_jitter().


mobile %>% 
  ggplot(aes(x=continent,
             y=mobile_subs,
             color=continent))+
  geom_boxplot()+
  geom_jitter(width=0.1,alpha=0.2)+
  theme(legend.position = "none")

And now we have a nice boxplot with jittered data points on top of it. A small mistake with the plot is that now we have some of the data points plotted twice in our boxplot with jittered data points.

Boxplot with jittered datapoints
Boxplot with jittered datapoints

When we make boxplot, geom_boxplot() by default shows the outlier data points. Since we have colored the boxplot by continent, we can easily see that continents Africa and Asia have data points in addition to data points.

mobile %>% 
  ggplot(aes(x=continent,
             y=mobile_subs,
             color=continent))+
  geom_boxplot()+
  theme(legend.position = "none")
Boxplots with Outlier Datapoints Highlighted
Boxplots with Outlier Datapoints Highlighted

Avoid Double Plotting in Boxplot with outlier.shape in ggplot2

When we add data points on top of boxplot, we plot these outlier data pints twice. And we can see the double plotting in our first boxplot clearly. A solution to avoid this mistake and not plot the outlier data points two times is to use the argument outlier.shape = NA inside geom_boxplot(). This will make boxplot without showing the outlier data points. Now we are not plotting out lier data points twice.


mobile %>% 
  ggplot(aes(x=continent,
             y=mobile_subs,
             color=continent))+
  # remove outlier points in boxplot with outlier.shape = NA
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(width=0.1,alpha=0.2)+
  theme(legend.position = "none")

This is a small mistake and may not affect most of the times, however, when you don’t have many data points in your groups, this can become a problem.

Boxplots with Jittered Data points  with outlier shape
Boxplots with Jittered Data points with outlier shape

Related posts:

How to Make Heatmap with ggplot2?How To Make Simple Heatmaps with ggplot2 in R? Stacked Barplots Side By Side with ggplot2 in RHow to Make Horizontal Stacked Barplots with ggplot2 in R? Scatter Plot R: Fill color by variableHow To Color Scatter Plot by Variable in R with ggplot2? How to Align Title Position in ggplot2?How To Adjust Title Position in ggplot2 ?

Filed Under: ggplot boxplot outlier.shape, R, remove outlier boxplot Tagged With: ggplot2, ggplot2 Boxplot, R

Primary Sidebar

Tags

Altair barplot Boxplot boxplot python boxplot with jiitered text labels Bubble Plot Color Palette Countplot Density Plot Facet Plot gganimate ggplot2 ggplot2 Boxplot ggplot2 error ggplot boxplot ggridges ggtext element_markdown() Grouped Barplot R heatmap heatmaps Histogram Histograms Horizontal boxplot Python lollipop plot Maps Matplotlib Pandas patchwork pheatmap Pyhon Python R RColorBrewer reorder boxplot ggplot Ridgeline plot Scatter Plot Scatter Plot Altair Seaborn Seaborn Boxplot Stock Price Over Time Stripplot UpSetR Violinplot Violin Plot World Map ggplot2

Buy Me a Coffee

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version