How to order facet variable in grouped boxplot by mean difference between groups in ggplot2

In this tutorial, we will learn how to order facet variable in a grouped boxplot by mean difference between groups in each facet plot with ggplot2. We will use Palmer penguin dataset to make a grouped boxplot using facet_wrap() in ggplot2.

library(palmerpenguins
library(tidyverse)
theme_set(theme_bw(16)

Our data looks like this.

penguins <- penguins |>
  drop_na()
penguins |>
  head()

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           36.7          19.3               193        3450
5 Adelie  Torgersen           39.3          20.6               190        3650
6 Adelie  Torgersen           38.9          17.8               181        3625
# ℹ 2 more variables: sex <fct>, year <int>

First, let us make a grouped boxplot using facet_wrap(). Here the facet variable is penguin species and grouping variable for each species is sex. And our third variable for the boxplot is body mass. We are interested in understanding the mean differences in body mass for each species.

penguins |>
  ggplot(aes(x=sex, y=body_mass_g, fill=sex))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(width=0.1)+
  facet_wrap(~species)+
  theme(legend.position = "none")+
  scale_fill_brewer(palette="Dark2")+
  scale_y_continuous(breaks=scales::breaks_pretty(6))+
  labs(title=str_wrap("How to order facet variable in grouped boxplot by mean difference", width = 50)
ggsave("How_to_order_facet_grouped_boxplot_by_mean_difference_between_groups.png")

We can see that the middle facet, the Chinstrap penguins have the smallest difference in mean body mass.

How to order facet_variable in a grouped boxplot by mean difference between groups
How to order facet_variable in a grouped boxplot by mean difference between groups

In order to, rearrange/order the facet variable by mean difference between the groups is to calculate the mean difference for each facet variable value.

Here we calculate mean body mass difference between male and female for each penguin species using group_by(), summarize() and pivot_wider() fucntions.

mean_diff_df <- penguins |>
  select(species, sex, body_mass_g) |>
  group_by(species, sex) |>
  summarize(mean_val = mean(body_mass_g))  |>
  ungroup() |>
  pivot_wider(names_from="sex", values_from="mean_val") |>
  mutate(mean_diff = male-female) |>
  arrange(mean_diff)

The computed mean difference looks like this.

mean_diff_df

# A tibble: 3 × 4
  species   female  male mean_diff
  <fct>      <dbl> <dbl>     <dbl>
1 Chinstrap  3527. 3939.      412.
2 Adelie     3369. 4043.      675.
3 Gentoo     4680. 5485.      805.

Now we can join the mean difference value to the original dataframe and order the facet variable using forcats’ fct_reorder() function by the difference in mean body mass. After that we can make the grouped boxplot using facet_wrap() as before.

penguins |>
  left_join(mean_diff_df, by="species") |>
  mutate(species=forcats::fct_reorder(species,
                                      mean_diff)) |>
  ggplot(aes(x=sex, y=body_mass_g, fill=sex))+
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(width=0.1)+
  facet_wrap(~species)+
  theme(legend.position = "none")+
  scale_fill_brewer(palette="Dark2")+
  scale_y_continuous(breaks=scales::breaks_pretty(6))+
  labs(title=str_wrap("Ordering facet variable in grouped boxplot by mean difference", width=50))
ggsave("order_facet_variable_in_grouped_boxplot_by_mean_difference_between_groups.png")

Now our facet variable is nicely ordered by the mean differences.

Ordering facet variable in a grouped boxplot by mean difference between groups
Exit mobile version