In this tutorial, we will learn how to order facet variable in a grouped boxplot by mean difference between groups in each facet plot with ggplot2. We will use Palmer penguin dataset to make a grouped boxplot using facet_wrap() in ggplot2.
library(palmerpenguins library(tidyverse) theme_set(theme_bw(16)
Our data looks like this.
penguins <- penguins |> drop_na()
penguins |> head() # A tibble: 6 × 8 species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g <fct> <fct> <dbl> <dbl> <int> <int> 1 Adelie Torgersen 39.1 18.7 181 3750 2 Adelie Torgersen 39.5 17.4 186 3800 3 Adelie Torgersen 40.3 18 195 3250 4 Adelie Torgersen 36.7 19.3 193 3450 5 Adelie Torgersen 39.3 20.6 190 3650 6 Adelie Torgersen 38.9 17.8 181 3625 # ℹ 2 more variables: sex <fct>, year <int>
First, let us make a grouped boxplot using facet_wrap(). Here the facet variable is penguin species and grouping variable for each species is sex. And our third variable for the boxplot is body mass. We are interested in understanding the mean differences in body mass for each species.
penguins |> ggplot(aes(x=sex, y=body_mass_g, fill=sex))+ geom_boxplot(outlier.shape = NA)+ geom_jitter(width=0.1)+ facet_wrap(~species)+ theme(legend.position = "none")+ scale_fill_brewer(palette="Dark2")+ scale_y_continuous(breaks=scales::breaks_pretty(6))+ labs(title=str_wrap("How to order facet variable in grouped boxplot by mean difference", width = 50) ggsave("How_to_order_facet_grouped_boxplot_by_mean_difference_between_groups.png")
We can see that the middle facet, the Chinstrap penguins have the smallest difference in mean body mass.
In order to, rearrange/order the facet variable by mean difference between the groups is to calculate the mean difference for each facet variable value.
Here we calculate mean body mass difference between male and female for each penguin species using group_by(), summarize() and pivot_wider() fucntions.
mean_diff_df <- penguins |> select(species, sex, body_mass_g) |> group_by(species, sex) |> summarize(mean_val = mean(body_mass_g)) |> ungroup() |> pivot_wider(names_from="sex", values_from="mean_val") |> mutate(mean_diff = male-female) |> arrange(mean_diff)
The computed mean difference looks like this.
mean_diff_df # A tibble: 3 × 4 species female male mean_diff <fct> <dbl> <dbl> <dbl> 1 Chinstrap 3527. 3939. 412. 2 Adelie 3369. 4043. 675. 3 Gentoo 4680. 5485. 805.
Now we can join the mean difference value to the original dataframe and order the facet variable using forcats’ fct_reorder() function by the difference in mean body mass. After that we can make the grouped boxplot using facet_wrap() as before.
penguins |> left_join(mean_diff_df, by="species") |> mutate(species=forcats::fct_reorder(species, mean_diff)) |> ggplot(aes(x=sex, y=body_mass_g, fill=sex))+ geom_boxplot(outlier.shape = NA)+ geom_jitter(width=0.1)+ facet_wrap(~species)+ theme(legend.position = "none")+ scale_fill_brewer(palette="Dark2")+ scale_y_continuous(breaks=scales::breaks_pretty(6))+ labs(title=str_wrap("Ordering facet variable in grouped boxplot by mean difference", width=50)) ggsave("order_facet_variable_in_grouped_boxplot_by_mean_difference_between_groups.png")
Now our facet variable is nicely ordered by the mean differences.