How To Reorder Boxplots in R with ggplot2

Sorting Boxplots in Descending Order
How To Reorder Boxplots in R? Descending Order with reorder()?

In this post, we will learn how to re-order boxplots in R with ggplot2. We will make a boxplot using ggplot2 with multiple groups. By default, ggplot2 orders the groups in alphabetical order. We will see multiple examples of reordering boxplots by another variable in the data using reorder() function in base R. We will also see how to overcome a common error due to missing values in the data.

Load Data and tidyverse

We will use NYC flights data set for the year 2013 to make boxplot. We can get the flights data from R pacakge nycflights13.

Let us load tidyverse and nycflights13 package.

library(tidyverse)
library(nycflights13)
theme_set(theme_bw(base_size=16))

The flights data frame contains multiple details about the flights departed from three NYC area airports.

flights %>% colnames()

##  [1] "year"           "month"          "day"            "dep_time"      
##  [5] "sched_dep_time" "dep_delay"      "arr_time"       "sched_arr_time"
##  [9] "arr_delay"      "carrier"        "flight"         "tailnum"       
## [13] "origin"         "dest"           "air_time"       "distance"      
## [17] "hour"           "minute"         "time_hour"

Let us select a few variables from flights dataframe and estimate flight speed from distance and air_time.

flights_speed <- flights %>%
  select(carrier, distance, air_time)%>%
  mutate(speed=distance/air_time)

Default Boxplot with groups in alphabetical order using ggplot2

We will make boxplot of speed for each of airline carrier to understand the relationship between speed and carrier.

flights_speed %>% head()

## # A tibble: 6 x 4
##   carrier distance air_time speed
##   <chr>      <dbl>    <dbl> <dbl>
## 1 UA          1400      227  6.17
## 2 UA          1416      227  6.24
## 3 AA          1089      160  6.81
## 4 B6          1576      183  8.61
## 5 DL           762      116  6.57
## 6 UA           719      150  4.79

We can make boxplot in R with geom_boxplot() function in ggplot2.

flights_speed %>%
  ggplot(aes(x=carrier, y=speed)) +
  geom_boxplot() +
  labs(y="Speed", x="Carrier", subtitle="Speed vs Carrier: nycflight13 data")

We can see that boxplot made by ggplot is ordered in alphabetical order of names the airline carriers. With so many carriers on x-axis it is not easy to identify carriers with higher average speed or lower speed.

Boxplots in R with ggplot2

Reordering boxplots using reorder() in R

A better solution is to reorder the boxes of boxplot by median or mean values of speed. In R we can re-order boxplots in multiple ways. In this example, we will use the function reorder() in base R to re-order the boxes. We use reorder() function, when we specify x-axis variable inside the aesthetics function aes(). reorder() function sorts the carriers by mean values of speed by default.

flights_speed %>%
  ggplot(aes(x=reorder(carrier,speed), y=speed)) +
  geom_boxplot() +
  labs(y="Speed", x="Carrier", 
       subtitle="Sorting Boxplots with missing data")

Reordering boxplots in R: Error due to missing values

When we executed the above code chunk, we should have gotten reordered boxplots. Instead we got a boxplot that is till unordered.

Error in Reordering Boxplots with NAs in R

The reason is missing data in our flights_speed data frame. We also see the following warning when we made the plot.

>Removed 9430 rows containing non-finite values (stat_boxplot).

We need to specify within reorder() function to remove the data with missing values using na.rm=TRUE.

flights_speed %>%
  ggplot(aes(x=reorder(carrier,speed,na.rm = TRUE), y=speed)) +
  geom_boxplot() +
  labs(y="Speed", x="Carrier", 
       subtitle="Reordering Boxplots after removing missing data")

Now we have reordered boxplot. By default, it is re-ordered in ascending order.

How To Sort Boxplots in R?

Reordering boxplots in descending order

To sort boxes in boxplot in descending order, we add negation to speed within reorder() function.

flights_speed %>%
  ggplot(aes(x=reorder(carrier,-speed, na.rm = TRUE), y=speed)) +
  geom_boxplot() +
  labs(y="Speed", x="Carrier", 
       subtitle="Reordering Boxplots: In Descending Order")

Now have reordered boxplots in descending order.

How To Reorder Boxplots in R? Descending Order with reorder()?
Exit mobile version