• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Data Viz with Python and R

Learn to Make Plots in Python and R

  • Home
  • Python Viz
  • Seaborn
  • Altair
  • R Viz
  • ggplot2
  • About
    • Privacy Policy
  • Show Search
Hide Search

How to Deal with Outliers in boxplot with ggplot2

datavizpyr · August 25, 2024 ·

In this post, we will learn of two ways to deal with outlier data points while making a boxplot. By default, ggplot2 boxplot() identifies the outliers and disply them as black dots at the boxplot extremes. One of the options to deal with outlier is to ignore them. With ggplot2, there are two ways to ignore the outliers in boxplot. We will learn how to ignore the outliers in boxplot and the subtle difference between the two approaches of ignoring outliers.

Let us first load the packages needed.

library(tidyverse)
library(gapminder)
theme_set(theme_bw(16))
packageVersion("ggplot2")

[1] '3.5.1'

We will be using gapminder data to make boxplot between life expectancy and continents.

gapminder |> head()

# A tibble: 6 × 6
  country     continent  year lifeExp      pop gdpPercap
  <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
1 Afghanistan Asia       1952    28.8  8425333      779.
2 Afghanistan Asia       1957    30.3  9240934      821.
3 Afghanistan Asia       1962    32.0 10267083      853.
4 Afghanistan Asia       1967    34.0 11537966      836.
5 Afghanistan Asia       1972    36.1 13079460      740.
6 Afghanistan Asia       1977    38.4 14880372      786.

We can see that, by default, ggplot2 displays the outlier data points as dots at the extremes of the boxes.

gapminder |>
  ggplot(aes(x=continent, y=lifeExp, fill=continent))+
  geom_boxplot()+
  labs(title="gapminder dataset with outliers")+
  theme(legend.position="none")+
  scale_y_continuous(breaks=scales::breaks_pretty(n=8))
ggsave("boxplot_outliers_ggplot2.png")

In the example below, both the continents Africa and Europe have a few outliers at the top and bottom of the boxes in the boxplot.

Boxplot with outliers in ggplot2
Boxplot with outliers in ggplot2

outlier.shape=NA: Ignore outliers in boxplot by not displaying them

We can use outlier.shape=NA as an argument to geom_boxplot() to ignore while displaying the boxplot. Under the hood, geom_boxplot() uses all the data to compute the summary statistics to make the boxplot.

gapminder |>
  ggplot(aes(x=continent, y=lifeExp, fill=continent))+
  geom_boxplot(outlier.shape=NA)+
  labs(title="Do not show outliers in the boxplot")+
  theme(legend.position="none")+
  scale_y_continuous(breaks=scales::breaks_pretty(n=8))
ggsave("do_not_show_outliers_in_boxplot_ggplot2.png")
outlier.shape=NA: do not show outlier data in the boxplot with ggplot2
outlier.shape=NA: do not show outlier data in the boxplot with ggplot2

outliers=FALSE: to not take them into account while making boxplot

With outliers=FALSE as an argument to geom_boxplot() to we can ignore the outliers while computing the summary statistics to make the boxplot. outliers=FALSE argument is available from ggplot2 version 3.5.0 and the big difference by “discarding outliers, the axis limits will adapt to the box and whiskers only, not the full data range”.

gapminder |>
  ggplot(aes(x=continent, y=lifeExp, fill=continent))+
  geom_boxplot(outliers=FALSE)+
  labs(title="Do not use outliers in computing boxplot")+
  theme(legend.position="none")+
  scale_y_continuous(breaks=scales::breaks_pretty(n=8))
ggsave("do_not_use_outliers_in_boxplot_ggplot2.png")
outliers=FALSE: do not use outliers in computing parameters for boxplot with ggplot2
outliers=FALSE: do not use outliers in computing parameters for boxplot with ggplot2

Related posts:

Customizing Mean mark to boxplot with ggplot2How To Show Mean Value in Boxplots with ggplot2? Scatterplot with marginal multi-histogram with ggExtraHow To Make Scatterplot with Marginal Histograms in R? ggforce geom_circle(): Annotate with a circleHow To Annotate a Plot with Circle in R Default ThumbnailHow to Make Axis Text Bold in ggplot2

Filed Under: ggplot2, R Tagged With: boxplot outliers

Primary Sidebar

Tags

Altair barplot Boxplot boxplot python boxplot with jiitered text labels Bubble Plot Color Palette Countplot Density Plot Facet Plot gganimate ggplot2 ggplot2 Boxplot ggplot2 error ggplot boxplot ggridges ggtext element_markdown() Grouped Barplot R heatmap heatmaps Histogram Histograms Horizontal boxplot Python lollipop plot Maps Matplotlib Pandas patchwork pheatmap Pyhon Python R RColorBrewer reorder boxplot ggplot Ridgeline plot Scatter Plot Scatter Plot Altair Seaborn Seaborn Boxplot Stock Price Over Time Stripplot UpSetR Violinplot Violin Plot World Map ggplot2

Buy Me a Coffee

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version