• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Data Viz with Python and R

Learn to Make Plots in Python and R

  • Home
  • Python Viz
  • Seaborn
  • Altair
  • R Viz
  • ggplot2
  • About
    • Privacy Policy
  • Show Search
Hide Search

How To Make tSNE plot in R

datavizpyr · June 19, 2021 ·

tSNE is dimensionality reduction technique suitable for visualizing high dimensional datasets. tSNE is an abbreviation of t-Distributed Stochastic Neighbor Embedding (t-SNE) and it was introduced by van der Maaten and Hinton. In this tutorial, we will learn how to perform tSNE in R without going into theoretical underpinnings of tSNE. Our main goal is to learn, how to make tSNE plot to understand pattern or structure in a high dimensional dataset.

Loading Data and Packages

We will use Palmer Penguin dataset to make a tSNE plot in R. We will perform tSNE using the R package Rtsne. Let us load the packages needed and set black and white theme for ggplot2.

library(tidyverse)
library(palmerpenguins)
library(Rtsne)
theme_set(theme_bw(18))

To perform tSNE using Palmer Penguin’s dataset, we will use numerical columns and ignore non-numerical columns as meta data. First, let us remove any missing data and add unique row ID.

penguins <- penguins %>% 
  drop_na() %>%
  select(-year)%>%
  mutate(ID=row_number()) 
## # A tibble: 6 x 8
##   species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex  
##   <fct>   <fct>           <dbl>         <dbl>            <int>       <int> <fct>
## 1 Adelie  Torge…           39.1          18.7              181        3750 male 
## 2 Adelie  Torge…           39.5          17.4              186        3800 fema…
## 3 Adelie  Torge…           40.3          18                195        3250 fema…
## 4 Adelie  Torge…           36.7          19.3              193        3450 fema…
## 5 Adelie  Torge…           39.3          20.6              190        3650 male 
## 6 Adelie  Torge…           38.9          17.8              181        3625 fema…
## # … with 1 more variable: ID <int>
penguins_meta <- penguins %>%
  select(ID,species,island,sex)

Performing tSNE with Rtsne package

Let us select numerical columns using is.numeric() function with select(), standardise the data using scale() function before applying Rstne() function to perform tSNE.

set.seed(142)
tSNE_fit <- penguins %>%
  select(where(is.numeric)) %>%
  column_to_rownames("ID") %>%
  scale() %>% 
  Rtsne()

The tSNE result object contains two tSNE components that we are interested in. We can extract the components and save it in a dataframe.

tSNE_df <- tSNE_fit$Y %>% 
  as.data.frame() %>%
  rename(tSNE1="V1",
         tSNE2="V2") %>%
  mutate(ID=row_number())

Using the unique row ID, we can combine the tSNE components with the meta data information.

tSNE_df <- tSNE_df %>%
  inner_join(penguins_meta, by="ID")

Now, we have all the data needed to make a tSNE plot.

tSNE_df %>% head()

##       tSNE1     tSNE2 ID species    island    sex
## 1  7.423581 3.6858054  1  Adelie Torgersen   male
## 2  8.912035 1.5917482  2  Adelie Torgersen female
## 3 10.857999 0.1403718  3  Adelie Torgersen female
## 4  7.118340 6.0383213  4  Adelie Torgersen female
## 5  4.481419 7.0784444  5  Adelie Torgersen   male
## 6  8.910489 2.4027551  6  Adelie Torgersen female

tSNE plot colored by a variable

Let us make a tSNE plot, which is a scatter plot with two tSNE components on x and y-axis. Here we have colored the data points by species and different shape for sex variable from the Penguin dataset.

tSNE_df %>%
  ggplot(aes(x = tSNE1, 
             y = tSNE2,
             color = species,
             shape = sex))+
  geom_point()+
  theme(legend.position="bottom")
ggsave("tSNE_plot_example1.png")

Note that tSNE, an unsupervised dimensionality reduction technique to visualize high dimensional data has nicely captured patterns in the data. By coloring the data points by species, we can see that three Penguin species have distinct features that drive the clusters we see on the tSNE plot.

tSNE Plot
tSNE Plot

tSNE plot Example 2

Here is another example of tSNE plot, this time we have colored by species but used shape argument to island variable in the dataset.

tSNE_df %>%
  ggplot(aes(x = tSNE1, 
             y = tSNE2,
             color = species,
             shape = island))+
  geom_point()
ggsave("tSNE_plot_example2.png")

Second tSNE plot example show the similarities between species and island variable. For example, Adelie Penguins are only present in Island Biscoe. The species Chinstrap seem to present in all the islands.

tSNE Plot Example 2
tSNE Plot Example 2

tSNE for identifying potential sample mismatch

One of the interesting patterns that we can see in the first tSNE plot example is that about 5 Chinstrap penguin samples (in green) seem to nicely cluster with Adelie cluster (in red).

tSNE_df %>%
  ggplot(aes(x = tSNE1, 
             y = tSNE2,
             color = species,
             shape = island))+
  geom_point()+
  geom_circle(aes(x0 = 9, y0 = -2, r = 2), 
              color="green",
              inherit.aes = FALSE)

Here we have annotated the five Chinstrap samples with a circle to highlight them. First tSNE plot shows that all the samples are females and second tSNE plot show that they are all the same island where there are other species as well.

And this suggests that potentially these could have been mislabeled as Chinstrap, while they actually belong to Adelie.

tSNE plot suggesting a potential sample mismatch
tSNE plot suggesting a potential sample mismatch

This could be due to other issues as well, however digging into the data a bit more does not rule out the possibility of mislabeling.

Related posts:

Customizing Mean mark to boxplot with ggplot2How To Show Mean Value in Boxplots with ggplot2? Scatterplot with marginal multi-histogram with ggExtraHow To Make Scatterplot with Marginal Histograms in R? Annotate Clusters with Ellipse with Labels ggforceHow To Annotate Clusters with Circle/Ellipse by a Variable in R Make Axis Title Bold Font with element_text()How to Make Axis Title Bold Font with ggplot2

Filed Under: ggplot2, R Tagged With: tSNE plot in R

Primary Sidebar

Tags

Altair barplot Boxplot boxplot python boxplot with jiitered text labels Bubble Plot Color Palette Countplot Density Plot Facet Plot gganimate ggplot2 ggplot2 Boxplot ggplot2 error ggplot boxplot ggridges ggtext element_markdown() Grouped Barplot R heatmap heatmaps Histogram Histograms Horizontal boxplot Python lollipop plot Maps Matplotlib Pandas patchwork pheatmap Pyhon Python R RColorBrewer reorder boxplot ggplot Ridgeline plot Scatter Plot Scatter Plot Altair Seaborn Seaborn Boxplot Stock Price Over Time Stripplot UpSetR Violinplot Violin Plot World Map ggplot2

Buy Me a Coffee

Copyright © 2025 · Daily Dish Pro on Genesis Framework · WordPress · Log in

Go to mobile version