How To Color Boxplots By a Variable in R with ggplot2?

Filling boxplot with colors using geom_boxplot()
Filling boxplot with colors using geom_boxplot()

In this tutorial, we learn how to color boxplots in R by a variable. With ggplot2 in R, we can color boxplots in multiple ways. In this post, we will first see how to make a simple boxplot in R. And then we will learn how to fill the boxes on boxplot by a variable. Then we will learn how to color lines boxes in boxplot by a variable. In this case, boxes in boxplot will not be filled with color. Next, we will add actual data points on top of the boxplot in R and learn to add color to boxplots with points.

library(tidyverse)

Let us generate some data using simulation to make boxplots with colors.

set.seed(23)
n <- 50
# tibble containing the data
df <- tibble(height = c(rnorm(n,mean=150,sd=10),
                        rnorm(n,mean=100,sd=20) ),
             age_group = c(rep("Adult", n), 
                           rep("Kid", n)))
head(df)
## # A tibble: 6 x 2
##   height age_group
##    <dbl> <chr>    
## 1   152. Adult    
## 2   146. Adult    
## 3   159. Adult    
## 4   168. Adult    
## 5   160. Adult    
## 6   161. Adult

Simple Boxplot without Color

We can make boxplots in R with ggplot2 using geom_boxplot() function. We first provide the data to ggplot() function, then specify the x and y-axis for the boxplot using the aesthetics function aes(). Then we add geom_boxplot() to make boxplot.

df %>% ggplot(aes(x=age_group, y=height)) + 
  geom_boxplot(width=0.5,lwd=1) 

In this example, we also specified width of the box plot and thickness of line for the boxes.

Simple Boxplot without Colors: ggplot2 in R

Filling Boxplot with Colors by Variable

Let us color boxplots using another variable in R using ggplot2. Here, we fill boxes with color. We can add fill color to boxplots using fill argument inside aesthetics function aes() by assigning the variable to it.

df %>% ggplot(aes(x=age_group, y=height, fill=age_group)) + 
  geom_boxplot(width=0.5,lwd=1)+
  labs(subtitle="Filling Boxplot with Colors by a Variable")

In this example, we fill boxplots with colors using the variable “age_group” by specifying fill=age_group. ggplot2 automatically uses a default color theme to fill the boxplots with colors.

Filling boxplot with colors by a variable

Coloring Boxplot by Variable

Let us color the lines of boxplots using another variable in R using ggplot2. Here the boxes in boxplot will be empty. We can color a boxplot like this using color argument inside aesthetics function aes() as shown below.

df %>% ggplot(aes(x=age_group, y=height, color=age_group)) + 
  geom_boxplot(width=0.5,lwd=1)+
  labs(subtitle="Coloring Boxplot with Colors by a Variable")

Here, we color the lines of boxplot by specifying color=age_group. ggplot2 also makes a legend for colors and variable used in the plot. Notice the differences in the legends when we use fill vs when we use color to add colors to boxplot by a variable. With fill, the tiny boxplots on the legends are also filled with colors. And similarly, when we use color, the tiny boxplot’s lines are colored with empty boxes.

Coloring Boxplot by a variable

Coloring Boxplot and Data Points Using Global aes() Function

Adding data points on top of boxplots is extremely useful as this shows the raw data, not just summary statistics of the data as boxes. We can add data points with jitter using geom_jitter() function as an another layer with ggplot. Note that we first add geom_boxplot() and then geom_jitter() to have data points on top of the boxplot.

We specify color to add color to both boxplot lines and data points by variable inside the global aes() function. Specifying color within global aes() adds colors to all the geomes we use.

df %>% 
  ggplot(aes(x=age_group, y=height, color=age_group)) + 
  geom_boxplot(width=0.5,lwd=1.5) +
  geom_jitter(width=0.15) +
  labs(subtitle="Coloring Boxplots and Data Points\nwith Global Definition")

While adding data points with jitter, we specify width for jittered data points that is smaller than geom_boxplot()’s width to have the points inside the boxes.

Coloring boxplot and data points

Coloring Boxplot and Data Points Using geom_boxplot() and geom_jitter()

We can also add colors to boxplots and data points using aethetics function aes() with in each of the geoms we used. For example, here we have two geoms; geom_boxplot() and geom_jitter(). Therefore we can specify colors by adding aes() inside geom_boxplot() and specify color=age_group and adding aes() inside geom_jitter() with color=age_group.

df %>% 
  ggplot(aes(x=age_group, y=height)) + 
  geom_boxplot(width=0.5,lwd=1.5, aes(color=age_group)) +
  geom_jitter(width=0.15,aes(color=age_group))+
  labs(subtitle="Coloring Boxplots and Data Points\nwith geom_boxplot() and geom_jitter()")

Now we get the same boxplots with data points colored as in the previous example. Only difference is that we have added colors individually instead of adding colors in global aes() function inside ggplot() function.

Coloring Boxplot and points using geom_boxplot() and geom_jitter()

Coloring Boxplot, but not Data Points Using aes() within geom_boxplot()

Let us see an an example of how to add colors by a a variable to boxes in a boxplot, but not to the data points. Our ability to have aesthetics function inside each geom_() gives us the control to color how we want. To add color to boxes but not to the points, we add aes() function inside geom_boxplot() with color=age_group and make sure there is no color argument in global aes() and in geom_jitter() functions.

df %>% 
  ggplot(aes(x=age_group, y=height)) + 
  geom_boxplot(width=0.5,lwd=1.5,aes(color=age_group)) +
  geom_jitter(width=0.15)+
  labs(subtitle="Coloring Boxplot, but not points")

Now we have colored the boxes in boxplot by a variable, but not the data points as we wanted.

Coloring Boxplot using geom_boxplot()

Coloring Jittered Data Points, but not boxplot() Using aes() within geom_jitter()

Let us see an example of how to add colors to jittered data points but not to the boxplots. To do that we specify aes() function with color=age_group inside geom_jitter() function, but not inside global aes() and geom_boxplot() function.

df %>% 
  ggplot(aes(x=age_group, y=height)) + 
  geom_boxplot(width=0.5,lwd=1.5) +
  geom_jitter(width=0.15, aes(color=age_group)) +
  labs(subtitle="Coloring Jittered Point, not boxplots")

Now we have jittered data points colored by a variable, but boxplots in black.

Coloring jittered points using geom_jitter()

Filling boxplot() Using gloabl aes(), but not points

Here is another variation of coloring boxplots by a variable, but this time filling boxplots with colors instead of just coloring the lines of boxplots. Here we use fill argument inside global aes() function.

df %>% 
  ggplot(aes(x=age_group, y=height, fill=age_group)) + 
  geom_boxplot(width=0.5,lwd=1.5) +
  geom_jitter(width=0.15)+
  labs(subtitle="Filling Boxplot with global aes(), but not points")
Filling boxplot with colors using global aes()

Filling boxplot() Using geom_boxplot(), but not points

This example is another way to fill boxplots with colors without coloring the data points. We can fill colors to boxplot using fill argument inside aes() function within geom_boxplot().

df %>% 
  ggplot(aes(x=age_group, y=height)) + 
  geom_boxplot(width=0.5,lwd=1.5, aes(fill=age_group)) +
  geom_jitter(width=0.15)+
  labs(subtitle="Filling Boxplot with geom_boxplot(), but not points")
Filling boxplot with colors using geom_boxplot()
Exit mobile version