How to make boxplots between one categorical variable vs all numerical variables

Multiple boxplots between one categorical variable vs all numerica variable in dataframe
Multiple Boxplots: one categorical variable vs all numerica variables

Often one might be interested in quickly visualizing the relationship between one categorical vs all numerical variables in a dataframe. We can make multiple boxplots between one categorical variable against all the numerical variables on the dataframe at the same time.

In this post we will consider a simple example where our dataframe contains one qualitative column and multiple quantitative columns. And use pivot_wrap() to make a single plot containing multiple boxplots.

First, let us load the packages needed. We will use Palmer Penguins dataset to make the boxplots between species variable vs numerical variables.

library(tidyverse)
library(palmerpenguins)
theme_set(theme_bw(16))

Penguins data set looks like this.

penguins %>%
  head()

# A tibble: 6 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
4 Adelie  Torgersen           NA            NA                  NA          NA
5 Adelie  Torgersen           36.7          19.3               193        3450
6 Adelie  Torgersen           39.3          20.6               190        3650
# 2 more variables: sex <fct>, year <int>

Let us simplify the data so that it contains only one categorical variable, here species, and multiple numerical variables.

penguins_df <-
  penguins %>%
  select(-sex, -island, -year)
penguins_df %>%
  head()

# A tibble: 6 × 5
  species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>            <dbl>         <dbl>             <int>       <int>
1 Adelie            39.1          18.7               181        3750
2 Adelie            39.5          17.4               186        3800
3 Adelie            40.3          18                 195        3250
4 Adelie            NA            NA                  NA          NA
5 Adelie            36.7          19.3               193        3450
6 Adelie            39.3          20.6               190        3650

Multiple Boxplots at the same time

One way to make multiple boxplots at the same time is to use tidy the data using pivot_longer() and then use facet_wrap() to make multiple plots in the same plot. We can tidy up the data by using pivot_longer() function to create a long form dataframe with three columns.

penguins_long <- penguins_df %>%
  pivot_longer(-species, names_to = "feature_name",
               values_to = "feature_value")
penguins_long %>% head()

# A tibble: 6 × 3
  species feature_name      feature_value
  <fct>   <chr>                     <dbl>
1 Adelie  bill_length_mm             39.1
2 Adelie  bill_depth_mm              18.7
3 Adelie  flipper_length_mm         181  
4 Adelie  body_mass_g              3750  
5 Adelie  bill_length_mm             39.5
6 Adelie  bill_depth_mm              17.4

Now that the data is tidy/longer, we can make multiple boxplots in the same figure using facet_wrap() function.

penguins_long %>%
  ggplot(aes(x=species, y=feature_value)) +
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(width=0.1)+
  facet_wrap(~feature_name,scales="free_y")+
  theme(legend.position = "none")
ggsave("boxplot_between_one_categorical_vs_all_numerical_variables.png")
Multiple boxplots on all numerical variables in a dataframe with facet_wrap()

Here we customize the plot by coloring by the categorical variable.

penguins_long %>%
  ggplot(aes(x=species, y=feature_value, color=species)) +
  geom_boxplot(outlier.shape = NA)+
  geom_jitter(width=0.1)+
  facet_wrap(~feature_name,scales="free_y")+
  theme(legend.position = "none")
ggsave("boxplot_between_one_categorical_variable_vs_all_numerical.png")
Multiple Boxplots: one categorical variable vs all numerica variables
Exit mobile version