In this tutorial, we learn how to color boxplots in R by a variable. With ggplot2 in R, we can color boxplots in multiple ways. In this post, we will first see how to make a simple boxplot in R. And then we will learn how to fill the boxes on boxplot by a variable. Then we will learn how to color lines boxes in boxplot by a variable. In this case, boxes in boxplot will not be filled with color. Next, we will add actual data points on top of the boxplot in R and learn to add color to boxplots with points.
library(tidyverse)
Let us generate some data using simulation to make boxplots with colors.
set.seed(23) n <- 50 # tibble containing the data df <- tibble(height = c(rnorm(n,mean=150,sd=10), rnorm(n,mean=100,sd=20) ), age_group = c(rep("Adult", n), rep("Kid", n)))
head(df) ## # A tibble: 6 x 2 ## height age_group ## <dbl> <chr> ## 1 152. Adult ## 2 146. Adult ## 3 159. Adult ## 4 168. Adult ## 5 160. Adult ## 6 161. Adult
Simple Boxplot without Color
We can make boxplots in R with ggplot2 using geom_boxplot() function. We first provide the data to ggplot() function, then specify the x and y-axis for the boxplot using the aesthetics function aes(). Then we add geom_boxplot() to make boxplot.
df %>% ggplot(aes(x=age_group, y=height)) + geom_boxplot(width=0.5,lwd=1)
In this example, we also specified width of the box plot and thickness of line for the boxes.
Filling Boxplot with Colors by Variable
Let us color boxplots using another variable in R using ggplot2. Here, we fill boxes with color. We can add fill color to boxplots using fill argument inside aesthetics function aes() by assigning the variable to it.
df %>% ggplot(aes(x=age_group, y=height, fill=age_group)) + geom_boxplot(width=0.5,lwd=1)+ labs(subtitle="Filling Boxplot with Colors by a Variable")
In this example, we fill boxplots with colors using the variable “age_group” by specifying fill=age_group. ggplot2 automatically uses a default color theme to fill the boxplots with colors.
Coloring Boxplot by Variable
Let us color the lines of boxplots using another variable in R using ggplot2. Here the boxes in boxplot will be empty. We can color a boxplot like this using color argument inside aesthetics function aes() as shown below.
df %>% ggplot(aes(x=age_group, y=height, color=age_group)) + geom_boxplot(width=0.5,lwd=1)+ labs(subtitle="Coloring Boxplot with Colors by a Variable")
Here, we color the lines of boxplot by specifying color=age_group. ggplot2 also makes a legend for colors and variable used in the plot. Notice the differences in the legends when we use fill vs when we use color to add colors to boxplot by a variable. With fill, the tiny boxplots on the legends are also filled with colors. And similarly, when we use color, the tiny boxplot’s lines are colored with empty boxes.
Coloring Boxplot and Data Points Using Global aes() Function
Adding data points on top of boxplots is extremely useful as this shows the raw data, not just summary statistics of the data as boxes. We can add data points with jitter using geom_jitter() function as an another layer with ggplot. Note that we first add geom_boxplot() and then geom_jitter() to have data points on top of the boxplot.
We specify color to add color to both boxplot lines and data points by variable inside the global aes() function. Specifying color within global aes() adds colors to all the geomes we use.
df %>% ggplot(aes(x=age_group, y=height, color=age_group)) + geom_boxplot(width=0.5,lwd=1.5) + geom_jitter(width=0.15) + labs(subtitle="Coloring Boxplots and Data Points\nwith Global Definition")
While adding data points with jitter, we specify width for jittered data points that is smaller than geom_boxplot()’s width to have the points inside the boxes.
Coloring Boxplot and Data Points Using geom_boxplot() and geom_jitter()
We can also add colors to boxplots and data points using aethetics function aes() with in each of the geoms we used. For example, here we have two geoms; geom_boxplot() and geom_jitter(). Therefore we can specify colors by adding aes() inside geom_boxplot() and specify color=age_group and adding aes() inside geom_jitter() with color=age_group.
df %>% ggplot(aes(x=age_group, y=height)) + geom_boxplot(width=0.5,lwd=1.5, aes(color=age_group)) + geom_jitter(width=0.15,aes(color=age_group))+ labs(subtitle="Coloring Boxplots and Data Points\nwith geom_boxplot() and geom_jitter()")
Now we get the same boxplots with data points colored as in the previous example. Only difference is that we have added colors individually instead of adding colors in global aes() function inside ggplot() function.
Coloring Boxplot, but not Data Points Using aes() within geom_boxplot()
Let us see an an example of how to add colors by a a variable to boxes in a boxplot, but not to the data points. Our ability to have aesthetics function inside each geom_() gives us the control to color how we want. To add color to boxes but not to the points, we add aes() function inside geom_boxplot() with color=age_group and make sure there is no color argument in global aes() and in geom_jitter() functions.
df %>% ggplot(aes(x=age_group, y=height)) + geom_boxplot(width=0.5,lwd=1.5,aes(color=age_group)) + geom_jitter(width=0.15)+ labs(subtitle="Coloring Boxplot, but not points")
Now we have colored the boxes in boxplot by a variable, but not the data points as we wanted.
Coloring Jittered Data Points, but not boxplot() Using aes() within geom_jitter()
Let us see an example of how to add colors to jittered data points but not to the boxplots. To do that we specify aes() function with color=age_group inside geom_jitter() function, but not inside global aes() and geom_boxplot() function.
df %>% ggplot(aes(x=age_group, y=height)) + geom_boxplot(width=0.5,lwd=1.5) + geom_jitter(width=0.15, aes(color=age_group)) + labs(subtitle="Coloring Jittered Point, not boxplots")
Now we have jittered data points colored by a variable, but boxplots in black.
Filling boxplot() Using gloabl aes(), but not points
Here is another variation of coloring boxplots by a variable, but this time filling boxplots with colors instead of just coloring the lines of boxplots. Here we use fill argument inside global aes() function.
df %>% ggplot(aes(x=age_group, y=height, fill=age_group)) + geom_boxplot(width=0.5,lwd=1.5) + geom_jitter(width=0.15)+ labs(subtitle="Filling Boxplot with global aes(), but not points")
Filling boxplot() Using geom_boxplot(), but not points
This example is another way to fill boxplots with colors without coloring the data points. We can fill colors to boxplot using fill argument inside aes() function within geom_boxplot().
df %>% ggplot(aes(x=age_group, y=height)) + geom_boxplot(width=0.5,lwd=1.5, aes(fill=age_group)) + geom_jitter(width=0.15)+ labs(subtitle="Filling Boxplot with geom_boxplot(), but not points")