• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Data Viz with Python and R

Learn to Make Plots in Python and R

  • Home
  • Python Viz
  • Seaborn
  • Altair
  • R Viz
  • ggplot2
  • About
    • Privacy Policy
  • Show Search
Hide Search

Visualizing Correlation with tidymodels’ corrr package

datavizpyr · August 18, 2022 ·

Corrr Package for Correlations
Corrr Package for Correlations

In this tutorial, we will learn examples of computing correlations among all the numerical variables in a dataframe and visualize the correlation in multiple ways. We will use Corrr package from tidymodels to compute the correlation and visualize the correlation.

corrr is a package for exploring correlations in R. It focuses on creating and working with data frames of correlations (instead of matrices) that can be easily explored via corrr functions or by leveraging tools like those in the tidyverse.

Let us get started by loading the packages needed.

# install corrr if needed
install.packages("corrr")
library(tidyverse)
library(palmerpenguins)
library(corrr)

We will use Palmer penguin dataset in this tutorial. Since we are mainly interested in computing correlations among the numerical variables, let us select the numerical columns for further analysis. We will also remove any rows with missing values for the sake of simplicity.

penguins <- penguins %>%
  drop_na() %>%
  select(-year) %>%
  select(where(is.numeric))

Our data for computing correlations looks like this.

penguins %>% 
         head()

## # A tibble: 6 x 4
##   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##            <dbl>         <dbl>             <int>       <int>
## 1           39.1          18.7               181        3750
## 2           39.5          17.4               186        3800
## 3           40.3          18                 195        3250
## 4           36.7          19.3               193        3450
## 5           39.3          20.6               190        3650
## 6           38.9          17.8               181        3625

Computing correlation with correlate

correlate() function is one of the key functions in corrr package that computes correlation on a dataframe. By default it uses Pearson correlation.

penguins_cor <- penguins %>% 
  correlate()

## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'

correlate() function computes correlation for all the variables in the dataframe against with each other. We can see that we have nice symmetric tibble with correlation values.

penguins_cor 

## # A tibble: 4 x 5
##   term              bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <chr>                      <dbl>         <dbl>             <dbl>       <dbl>
## 1 bill_length_mm            NA            -0.229             0.653       0.589
## 2 bill_depth_mm             -0.229        NA                -0.578      -0.472
## 3 flipper_length_mm          0.653        -0.578            NA           0.873
## 4 body_mass_g                0.589        -0.472             0.873      NA

We can also rearrange the correlation dataframe using rearrange function. Now the tibble has variables with highest correlation first instead of the original variable order.

penguins_cor %>% 
  rearrange()

## # A tibble: 4 × 5
##   term              flipper_length_mm body_mass_g bill_length_mm bill_depth_mm
##   <chr>                         <dbl>       <dbl>          <dbl>         <dbl>
## 1 flipper_length_mm            NA           0.873          0.653        -0.578
## 2 body_mass_g                   0.873      NA              0.589        -0.472
## 3 bill_length_mm                0.653       0.589         NA            -0.229
## 4 bill_depth_mm                -0.578      -0.472         -0.229        NA

Visualizing the correlation with dotplot

With the function rplot() we can visualize the correlation as dot plot, with a default color palette for correlation ranging from -1 to +1.

Here we plot the correlation after rearranging by its strength as shown above.

penguins %>% 
  correlate() %>%
  rearrange() %>%
  rplot()

## Correlation computed with
## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'

Correlation dot plot with rplot in Corr
Correlation dot plot with rplot in Corr

Instead of plotting the symmetric correlation matrix as dot plot, we can plot the lower triangular matrix alone. To do that by we can shave off the upper triangle using shape() function and then use rplot() to plot it. In this example, we have also specified to print the correlation value on top of the dots.

penguins %>% 
  drop_na() %>%
  select(where(is.numeric)) %>%
  correlate() %>%
  rearrange() %>%
  shave()  %>%
  rplot(print_cor=TRUE)
Lower triangular correlation plot with corrr
Lower triangular correlation plot with corrr

corrr package also has nice function to visualize the correlation as network. With network_plot() we can make a graph/network with notes as the variables and the edges as the correlation values between variables.

penguins %>% 
  drop_na() %>%
  select(where(is.numeric)) %>%
  correlate() %>%
  network_plot(min_cor=0.1)
ggsave("correlation_network_plot_in_corrr.png")

## Correlation computed with
## • Method: 'pearson'
## • Missing treated using: 'pairwise.complete.obs'
Correlation Network plot with corrr package
Correlation Network plot with corrr package

Related posts:

Customizing Mean mark to boxplot with ggplot2How To Show Mean Value in Boxplots with ggplot2? Scatterplot with marginal multi-histogram with ggExtraHow To Make Scatterplot with Marginal Histograms in R? Sinaplot and ViolinplotSinaplot vs Violin plot: Why Sinaplot is better than Violinplot Visualizing Missing Data with Barplot in R ggplot2Visualizing Missing Data with Barplot in R

Filed Under: ggplot2, R Tagged With: Correlation as network plot, correlation plot with corrr, corrr package

Primary Sidebar

Tags

Altair barplot Boxplot boxplot python boxplot with jiitered text labels Bubble Plot Color Palette Countplot Density Plot Facet Plot gganimate ggplot2 ggplot2 Boxplot ggplot2 error ggplot boxplot ggridges ggtext element_markdown() Grouped Barplot R heatmap heatmaps Histogram Histograms Horizontal boxplot Python lollipop plot Maps Matplotlib Pandas patchwork pheatmap Pyhon Python R RColorBrewer reorder boxplot ggplot Ridgeline plot Scatter Plot Scatter Plot Altair Seaborn Seaborn Boxplot Stock Price Over Time Stripplot UpSetR Violinplot Violin Plot World Map ggplot2

Buy Me a Coffee

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version