AI vs Manual Scatterplots in R: ggplot2 Workflows for the AI Era

Presentation ready scatterplot made using manual expertise with ggplot2
Presentation ready scatterplot made using manual expertise with ggplot2

If you’re creating scatterplots in R during 2025, you have likely wrestled with the question: should you let AI write your ggplot2 code, or stick to the tried-and-true manual approach? The answer isn’t as simple as “AI is faster” or “manual is better.” The reality is more nuanced, and the best approach depends on your specific needs, experience level, and the complexity of your data story.

In this comprehensive comparison, we’ll explore three distinct workflows for creating professional scatterplots using ggplot2. We’ll examine when AI, specially ChatGPT excels, where it falls short, and how a hybrid approach might give you the best of both worlds.

By the end of this post, you’ll have a clear framework for choosing between manual coding, AI assistance, and hybrid workflows for your R visualization needs.

The Dataset: Palmer Penguins in R

For our comparison, we’ll use the Palmer Penguins dataset—a modern alternative to the iris dataset that provides rich relationships perfect for scatterplot analysis. Our goal is to explore the relationship between flipper length and body mass across different penguin species, with proper statistical overlays and professional styling.

Setting Up Our R Environment

# Install packages if needed
# install.packages(c("ggplot2", "palmerpenguins", "dplyr"))
library(ggplot2)
library(palmerpenguins)
library(dplyr)
penguins |> 
  head()

Part 1: Manual ggplot2 Scatterplot (The Artisan’s Approach)

Let’s start with the traditional method that gives us complete control over every aspect of our visualization. This is our quality benchmark—the gold standard against which we’ll compare the other approaches.

Step 1: Basic Scatterplot Foundation

We begin with the most fundamental version to understand our data structure:

# Create the most basic scatterplot
penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point()

Basic Scatter Plot Made Manually with ggplot2

This gives us a functional but unrefined plot. We can see there’s a clear positive relationship, but we’re missing species differentiation and professional styling.

Step 2: Add Species Differentiation

Now we add color to distinguish between species:

# Add species coloring
penguins |>
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  geom_point(size = 2, alpha = 0.7)
Manually Improved Scatterplot ggplot2

Better! We can now see that the three species cluster in different regions of the plot, with Gentoo penguins being generally larger.

Step 3: Professional Polish and Statistical Enhancement

Now we apply our ggplot2 expertise to create a publication-ready visualization. Here we do a bit of data cleaning, add points with some styling, add regression lines to see the linear relationship clearly, add custom colors & labels and more.

# Create a polished, presentation-ready scatterplot
penguins |>
  # Data cleaning - remove missing values
  filter(!is.na(flipper_length_mm), !is.na(body_mass_g)) %>%
  
  # Set up the plot aesthetics
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  
  # Add points with refined styling
  geom_point(size = 2.5, alpha = 0.8) +
  
  # Add regression lines with confidence intervals
  geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
  
  # Custom color palette for better distinction
  scale_color_manual(values = c("Adelie" = "#FF6B35", 
                               "Chinstrap" = "#004E89", 
                               "Gentoo" = "#009639")) +
  
  # Professional labels
  labs(
    title = "Penguin Body Mass vs Flipper Length",
    subtitle = "Strong positive relationship varies significantly across species",
    x = "Flipper Length (mm)",
    y = "Body Mass (g)",
    color = "Species",
    caption = "Data: Palmer Station Antarctica LTER"
  ) +
  
  # Clean theme with custom adjustments
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(size = 16, face = "bold", margin = margin(b = 10)),
    plot.subtitle = element_text(size = 12, color = "grey40", margin = margin(b = 20)),
    legend.position = "top",
    legend.title = element_text(size = 11, face = "bold"),
    panel.grid.minor = element_blank(),
    plot.caption = element_text(size = 9, color = "grey50", hjust = 0)
  )

And this is how the plot looks like

Presentation ready scatterplot made using manual expertise with ggplot2

Manual Scatterplot Approach Assessment

Strengths:

  • Complete control over every visual element
  • Deep understanding of ggplot2’s grammar of graphics
  • Guaranteed quality matching your exact vision
  • Educational value – builds expertise in data visualization
  • Debugging capability – you understand every line when issues arise

Weaknesses:

  • Time-intensive – requires 15-30 minutes for complex plots
  • Syntax knowledge required – need to remember function names and arguments
  • Higher learning curve for beginners

Part 2: AI-Generated ggplot2 Scatterplot (The Assistant’s Approach)

Now let’s see how well AI can replicate our manual work. The key to success with AI is crafting detailed, specific prompts that translate your visualization goals into clear instructions.

Crafting the Perfect AI Prompt for scatterplot with ggplot2

Here’s our detailed prompt for AI (ChatGPT, Claude, or similar):

Detailed AI Prompt:

Using R with ggplot2 and the palmerpenguins package, create a professional scatterplot showing the relationship between flipper_length_mm (x-axis) and body_mass_g (y-axis).

Requirements:

  • Remove any rows with missing flipper_length_mm or body_mass_g values
  • Color points by species using these exact colors: Adelie=#FF6B35, Chinstrap=#004E89, Gentoo=#009639
  • Set point size to 2.5 and alpha to 0.8
  • Add regression lines for each species with confidence intervals (alpha=0.2)
  • Include these labels:
    • Title: “Penguin Body Mass vs Flipper Length”
    • Subtitle: “Strong positive relationship varies significantly across species”
    • X-axis: “Flipper Length (mm)”
    • Y-axis: “Body Mass (g)”
    • Caption: “Data: Palmer Station Antarctica LTER”
  • Use theme_minimal with base_size=12
  • Place legend at the top
  • Make title bold, size 16, with margin
  • Make subtitle size 12, grey40 color
  • Remove minor gridlines
  • Style caption as size 9, grey50, left-aligned

Provide complete, runnable R code using dplyr pipes.

AI Response Analysis

When we used the above prompt to the web version of ChatGPT 5, it gives the following R code ready to be copied and used.

AI-Generated Code (ChatGPT-5):

Here the ChatGPT 5 generated code.

library(ggplot2)
library(dplyr)
library(palmerpenguins)

# Prepare data: remove missing rows
penguins_clean <- penguins %>%
  filter(!is.na(flipper_length_mm), !is.na(body_mass_g))

# Custom color palette
species_colors <- c(
  "Adelie" = "#FF6B35",
  "Chinstrap" = "#004E89",
  "Gentoo" = "#009639"
)

# Scatterplot with regression lines
ggplot(penguins_clean, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  geom_point(size = 2.5, alpha = 0.8) +
  geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
  scale_color_manual(values = species_colors) +
  labs(
    title = "Penguin Body Mass vs Flipper Length",
    subtitle = "Strong positive relationship varies significantly across species",
    x = "Flipper Length (mm)",
    y = "Body Mass (g)",
    caption = "Data: Palmer Station Antarctica LTER"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "top",
    plot.title = element_text(face = "bold", size = 16, margin = margin(b = 8)),
    plot.subtitle = element_text(size = 12, color = "grey40", margin = margin(b = 12)),
    plot.caption = element_text(size = 9, color = "grey50", hjust = 0),
    panel.grid.minor = element_blank()
  )

ChatGPT 5 generates the above code and gives a bit of explanation as shown below.

  • Cleaned data (no missing values in flipper_length_mm or body_mass_g)
  • Colored points by species with your exact hex codes
  • Scatterplot points sized and styled as requested
  • Regression lines with confidence intervals
  • Professional labels and captions
  • Refined theme (minimal base, legend at top, styled title/subtitle/caption, no minor gridlines)

And ChatGPT 5 goes ahead to ask

Would you like me to also show you how to export this plot as a high-resolution PNG or PDF for professional use in reports/papers?

And ChatGPT 5 gives the following R code to save the plot as image and pdf, if you say yes.

# Save as high-res PNG (300 dpi, typical for publication)
ggsave("penguin_scatterplot.png", plot = p, width = 7, height = 5, dpi = 300)

# Save as PDF (vector format, scalable for print)
ggsave("penguin_scatterplot.pdf", plot = p, width = 7, height = 5)
ggplot2 scatterplot made using AI prompt in ChatGPT 5

AI Performance Evaluation: ChatGPT for Visualization

Results Analysis:

  • Remarkable accuracy: AI generated code almost identical to our manual version
  • Correct ggplot2 syntax: Proper use of geom_smooth(), scale_color_manual(), theme modifications
  • All requirements met: Data filtering, custom colors, sizing, labels all correctly implemented
  • ⚠️ Minor omissions: Missed the subtitle margin specification, but 95% accuracy overall

Testing AI with Variations:

Let’s test AI with a simpler prompt to see how it performs:

Simple Prompt: “Create a scatterplot of flipper length vs body mass by species using the penguins dataset in R.”

AI Response (Simple):

ChatGPT 5 gave did very well and gave back the following code. A caveat is that we need to check with a different ChatGPT 5 account.

# Load packages
library(ggplot2)
library(palmerpenguins)

# Create scatterplot
ggplot(data = penguins, aes(x = flipper_length_mm, 
                            y = body_mass_g, 
                            color = species)) +
  geom_point(size = 2.5, alpha = 0.8) +
  labs(
    title = "Penguin Body Mass vs Flipper Length",
    x = "Flipper Length (mm)",
    y = "Body Mass (g)",
    color = "Species"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "top")

And here is the plot generated by using the simple prompt on ChatGPT 5 and it pretty decent.

ggplot2 scatterplot: Using a simpler prompt in ChatGPT 5

This demonstrates an important point: AI output quality is directly proportional to prompt specificity.

AI Approach Assessment

Strengths:

  • Incredible speed – seconds vs 15-30 minutes for manual coding
  • High accuracy when prompts are detailed and specific
  • Accessibility – great for beginners who don’t know ggplot2 syntax
  • Rapid prototyping – excellent for quick data exploration
  • Consistency – generates similar code structure repeatedly

Weaknesses:

  • Prompt dependency – quality heavily depends on your ability to write specific prompts
  • Limited learning – doesn’t build your ggplot2 expertise
  • Black box effect – you might not understand the generated code
  • Iteration challenges – harder to make small adjustments without new prompts

Part 3: Hybrid ggplot2 Workflow (The Pragmatist’s Method)

The hybrid approach combines AI’s speed with human expertise for optimal results. This is often the most practical workflow for experienced R users.

Step 1: AI Foundation with Simple Prompt

We start with a basic prompt to get our foundation quickly:

Basic Prompt: “Create a scatterplot of flipper_length_mm vs body_mass_g colored by species using the penguins dataset in R with ggplot2.”

AI Foundation Response:

# Load necessary libraries
library(ggplot2)
library(palmerpenguins)

# Create scatterplot
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  labs(
    title = "Flipper Length vs Body Mass by Species",
    x = "Flipper Length (mm)",
    y = "Body Mass (g)",
    color = "Species"
  ) +
  theme_minimal()

This gives us a clean starting point in seconds, saving us from typing the basic ggplot structure.

Step 2: Human Enhancement and Refinement

Now we can apply our ggplot2 expertise to transform the basic plot into a professional visualization:

library(ggplot2)
library(palmerpenguins)
library(dplyr)

penguins |>
  # Our addition: data cleaning for robustness
  filter(!is.na(flipper_length_mm), !is.na(body_mass_g)) %>%
  
  # AI foundation: basic plot structure
  ggplot(aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
  
  # Our enhancement: better point aesthetics
  geom_point(size = 2.5, alpha = 0.8) +
  
  # Our addition: statistical insight with regression lines
  geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
  
  # Our expertise: custom color palette for better accessibility
  scale_color_manual(
    values = c("Adelie" = "#FF6B35", 
               "Chinstrap" = "#004E89", 
               "Gentoo" = "#009639"),
    guide = guide_legend(override.aes = list(alpha = 1))  # Our touch: solid legend colors
  ) +
  
  # AI foundation and Our addition: comprehensive, professional labeling and crediting data source
  labs(
    title = "Penguin Body Mass vs Flipper Length",
    subtitle = "Strong positive relationship varies significantly across species",
    x = "Flipper Length (mm)",
    y = "Body Mass (g)",
    color = "Species",
    caption = "Data: Palmer Station Antarctica LTER | Visualization: DataVizPyr"
  ) +
  
  # AI foundation enhanced with our theme expertise
  theme_minimal(base_size = 12) +
  theme(
    # Our refinements: professional typography hierarchy
    plot.title = element_text(size = 16, face = "bold", margin = margin(b = 8)),
    plot.subtitle = element_text(size = 12, color = "grey40", margin = margin(b = 20)),
    
    # Our touch: legend optimization
    legend.position = "top",
    legend.title = element_text(size = 11, face = "bold"),
    legend.margin = margin(b = 15),
    
    # Our expertise: grid optimization for clarity
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "grey90", size = 0.5),
    
    # Our addition: caption styling
    plot.caption = element_text(size = 9, color = "grey50", hjust = 0, margin = margin(t = 15))
  )

Hybrid Approach Assessment

Strengths:

  • Optimal efficiency – fast foundation (AI) + expert refinement (human)
  • Learning opportunity – you review and understand AI code before enhancing
  • Best of both worlds – speed + complete control over final output
  • Flexible iteration – easy to make adjustments and improvements
  • Maintains expertise – keeps you engaged with ggplot2 syntax and best practices

Weaknesses:

  • Requires intermediate expertise – need to recognize what needs improvement
  • Quality control needed – must evaluate AI output before building on it
  • Workflow complexity – more steps than pure manual or pure AI approaches

Comparative Analysis: Manual vs AI vs Hybrid in R

Aspect Manual Approach AI Approach Hybrid Approach
Development Time 15-30 minutes 1-3 minutes 5-12 minutes
Code Quality Excellent Variable High
Learning Value Maximum Minimal Good
Customization Complete Prompt-dependent High
Skill Required High ggplot2 expertise Prompt writing skills Intermediate R knowledge

Why This Matters Specifically for R Users

ggplot2’s grammar of graphics translates exceptionally well to AI prompts. The structured, layered approach of ggplot2 (data + aesthetics + geometries + themes) maps naturally to how we describe visualizations in plain English. This makes R + AI combinations particularly powerful compared to other visualization libraries.

R’s pipe operator (%>% or |>) works seamlessly with AI-generated code, making it easy to enhance AI foundations with additional data manipulation and refinement steps.

The R ecosystem’s consistency means AI models trained on R code tend to produce more reliable, idiomatic results compared to the more fragmented Python visualization landscape.

Recommendation Framework: Choose Your R Workflow

🎓 Learning ggplot2

Recommendation: Manual → Hybrid
Start with manual coding to build solid ggplot2 fundamentals. Once comfortable with the grammar of graphics, gradually incorporate hybrid workflows to see expert techniques.

⚡ Exploratory Analysis

Recommendation: Pure AI
When you need quick visualizations for data exploration and presentation quality isn’t critical, AI prompts can generate plots in seconds for rapid insights.

🏆 Production Visualizations

Recommendation: Hybrid
For reports, presentations, and publications, the hybrid approach delivers professional quality efficiently while maintaining your expertise and control.

🎨 Complex Custom Plots

Recommendation: Manual
When you need precise control over advanced ggplot2 features, custom annotations, or unconventional styling, manual coding remains the gold standard.

📊 Team Collaboration

Recommendation: Hybrid
Hybrid workflows create readable, well-documented code that team members can easily understand, modify, and maintain—regardless of their AI familiarity.

🚀 Skill Development

Recommendation: All Three
Use all approaches strategically: Manual for fundamentals, AI for inspiration and speed, Hybrid for practical application. This builds comprehensive modern R skills.

Conclusion: The Future of R Data Visualization

The integration of AI into R workflows represents a significant evolution in how we approach data visualization. Rather than replacing traditional ggplot2 skills, AI tools amplify our capabilities and change how we allocate our time and mental energy.

Key Takeaways:

  1. Manual coding remains essential for learning ggplot2 fundamentals and handling complex customizations
  2. AI excels at rapid prototyping and generating boilerplate code, dramatically reducing time-to-insight
  3. Hybrid workflows offer the best balance for most professional scenarios, combining efficiency with expertise
  4. The choice depends on context: your experience level, time constraints, and output requirements should guide your approach

Looking Forward:

As AI tools continue to improve, we can expect even better integration with R workflows. The future likely holds AI assistants that understand statistical context, suggest appropriate visualizations based on data types, and even help with interpretation of visual patterns.

The most successful R users in this AI era will be those who master all three approaches and know when to deploy each one strategically. Whether you’re building foundational skills, racing against deadlines, or crafting the perfect publication-quality plot, you now have a complete toolkit for modern R visualization workflows.

What approach will you try first? Share your results and let’s continue advancing the art and science of data visualization together.


Want to stay updated on the latest in AI-assisted data visualization? Subscribe to DataVizPyr for more practical guides, comparisons, and tutorials that help you work smarter with your data.

Build Smarter Boxplots in ggplot2

Discover modern boxplot workflows powered by AI. Learn step-by-step with clear R examples and styling options.


Explore the Tutorial →

Explore the Complete ggplot2 Guide

35+ tutorials with code: scatterplots, boxplots, themes, annotations, facets, and more—tested and beginner-friendly.

Visit the ggplot2 Hub → No fluff—just code and visuals.
Exit mobile version