If you’re creating scatterplots in R during 2025, you have likely wrestled with the question: should you let AI write your ggplot2 code, or stick to the tried-and-true manual approach? The answer isn’t as simple as “AI is faster” or “manual is better.” The reality is more nuanced, and the best approach depends on your specific needs, experience level, and the complexity of your data story.
In this comprehensive comparison, we’ll explore three distinct workflows for creating professional scatterplots using ggplot2. We’ll examine when AI, specially ChatGPT excels, where it falls short, and how a hybrid approach might give you the best of both worlds.
By the end of this post, you’ll have a clear framework for choosing between manual coding, AI assistance, and hybrid workflows for your R visualization needs.
The Dataset: Palmer Penguins in R
For our comparison, we’ll use the Palmer Penguins dataset—a modern alternative to the iris dataset that provides rich relationships perfect for scatterplot analysis. Our goal is to explore the relationship between flipper length and body mass across different penguin species, with proper statistical overlays and professional styling.
Setting Up Our R Environment
# Install packages if needed
# install.packages(c("ggplot2", "palmerpenguins", "dplyr"))
library(ggplot2) library(palmerpenguins) library(dplyr)
penguins |> head()
Part 1: Manual ggplot2 Scatterplot (The Artisan’s Approach)
Let’s start with the traditional method that gives us complete control over every aspect of our visualization. This is our quality benchmark—the gold standard against which we’ll compare the other approaches.
Step 1: Basic Scatterplot Foundation
We begin with the most fundamental version to understand our data structure:
# Create the most basic scatterplot penguins |> ggplot(aes(x = flipper_length_mm, y = body_mass_g)) + geom_point()
This gives us a functional but unrefined plot. We can see there’s a clear positive relationship, but we’re missing species differentiation and professional styling.
Step 2: Add Species Differentiation
Now we add color to distinguish between species:
# Add species coloring penguins |> ggplot(aes(x = flipper_length_mm, y = body_mass_g, color = species)) + geom_point(size = 2, alpha = 0.7)
Better! We can now see that the three species cluster in different regions of the plot, with Gentoo penguins being generally larger.
Step 3: Professional Polish and Statistical Enhancement
Now we apply our ggplot2 expertise to create a publication-ready visualization. Here we do a bit of data cleaning, add points with some styling, add regression lines to see the linear relationship clearly, add custom colors & labels and more.
# Create a polished, presentation-ready scatterplot
penguins |>
# Data cleaning - remove missing values
filter(!is.na(flipper_length_mm), !is.na(body_mass_g)) %>%
# Set up the plot aesthetics
ggplot(aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
# Add points with refined styling
geom_point(size = 2.5, alpha = 0.8) +
# Add regression lines with confidence intervals
geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
# Custom color palette for better distinction
scale_color_manual(values = c("Adelie" = "#FF6B35",
"Chinstrap" = "#004E89",
"Gentoo" = "#009639")) +
# Professional labels
labs(
title = "Penguin Body Mass vs Flipper Length",
subtitle = "Strong positive relationship varies significantly across species",
x = "Flipper Length (mm)",
y = "Body Mass (g)",
color = "Species",
caption = "Data: Palmer Station Antarctica LTER"
) +
# Clean theme with custom adjustments
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(size = 16, face = "bold", margin = margin(b = 10)),
plot.subtitle = element_text(size = 12, color = "grey40", margin = margin(b = 20)),
legend.position = "top",
legend.title = element_text(size = 11, face = "bold"),
panel.grid.minor = element_blank(),
plot.caption = element_text(size = 9, color = "grey50", hjust = 0)
)
And this is how the plot looks like
Manual Scatterplot Approach Assessment
Strengths:
- ✅ Complete control over every visual element
- ✅ Deep understanding of ggplot2’s grammar of graphics
- ✅ Guaranteed quality matching your exact vision
- ✅ Educational value – builds expertise in data visualization
- ✅ Debugging capability – you understand every line when issues arise
Weaknesses:
- ❌ Time-intensive – requires 15-30 minutes for complex plots
- ❌ Syntax knowledge required – need to remember function names and arguments
- ❌ Higher learning curve for beginners
Part 2: AI-Generated ggplot2 Scatterplot (The Assistant’s Approach)
Now let’s see how well AI can replicate our manual work. The key to success with AI is crafting detailed, specific prompts that translate your visualization goals into clear instructions.
Crafting the Perfect AI Prompt for scatterplot with ggplot2
Here’s our detailed prompt for AI (ChatGPT, Claude, or similar):
Detailed AI Prompt:
Using R with ggplot2 and the palmerpenguins package, create a professional scatterplot showing the relationship between flipper_length_mm (x-axis) and body_mass_g (y-axis).
Requirements:
- Remove any rows with missing flipper_length_mm or body_mass_g values
- Color points by species using these exact colors: Adelie=#FF6B35, Chinstrap=#004E89, Gentoo=#009639
- Set point size to 2.5 and alpha to 0.8
- Add regression lines for each species with confidence intervals (alpha=0.2)
- Include these labels:
- Title: “Penguin Body Mass vs Flipper Length”
- Subtitle: “Strong positive relationship varies significantly across species”
- X-axis: “Flipper Length (mm)”
- Y-axis: “Body Mass (g)”
- Caption: “Data: Palmer Station Antarctica LTER”
- Use theme_minimal with base_size=12
- Place legend at the top
- Make title bold, size 16, with margin
- Make subtitle size 12, grey40 color
- Remove minor gridlines
- Style caption as size 9, grey50, left-aligned
Provide complete, runnable R code using dplyr pipes.
AI Response Analysis
When we used the above prompt to the web version of ChatGPT 5, it gives the following R code ready to be copied and used.
AI-Generated Code (ChatGPT-5):
Here the ChatGPT 5 generated code.
library(ggplot2)
library(dplyr)
library(palmerpenguins)
# Prepare data: remove missing rows
penguins_clean <- penguins %>%
filter(!is.na(flipper_length_mm), !is.na(body_mass_g))
# Custom color palette
species_colors <- c(
"Adelie" = "#FF6B35",
"Chinstrap" = "#004E89",
"Gentoo" = "#009639"
)
# Scatterplot with regression lines
ggplot(penguins_clean, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
geom_point(size = 2.5, alpha = 0.8) +
geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
scale_color_manual(values = species_colors) +
labs(
title = "Penguin Body Mass vs Flipper Length",
subtitle = "Strong positive relationship varies significantly across species",
x = "Flipper Length (mm)",
y = "Body Mass (g)",
caption = "Data: Palmer Station Antarctica LTER"
) +
theme_minimal(base_size = 12) +
theme(
legend.position = "top",
plot.title = element_text(face = "bold", size = 16, margin = margin(b = 8)),
plot.subtitle = element_text(size = 12, color = "grey40", margin = margin(b = 12)),
plot.caption = element_text(size = 9, color = "grey50", hjust = 0),
panel.grid.minor = element_blank()
)
ChatGPT 5 generates the above code and gives a bit of explanation as shown below.
- Cleaned data (no missing values in flipper_length_mm or body_mass_g)
- Colored points by species with your exact hex codes
- Scatterplot points sized and styled as requested
- Regression lines with confidence intervals
- Professional labels and captions
- Refined theme (minimal base, legend at top, styled title/subtitle/caption, no minor gridlines)
And ChatGPT 5 goes ahead to ask
Would you like me to also show you how to export this plot as a high-resolution PNG or PDF for professional use in reports/papers?
And ChatGPT 5 gives the following R code to save the plot as image and pdf, if you say yes.
# Save as high-res PNG (300 dpi, typical for publication)
ggsave("penguin_scatterplot.png", plot = p, width = 7, height = 5, dpi = 300)
# Save as PDF (vector format, scalable for print)
ggsave("penguin_scatterplot.pdf", plot = p, width = 7, height = 5)
AI Performance Evaluation: ChatGPT for Visualization
Results Analysis:
- ✅ Remarkable accuracy: AI generated code almost identical to our manual version
- ✅ Correct ggplot2 syntax: Proper use of
geom_smooth(),scale_color_manual(), theme modifications - ✅ All requirements met: Data filtering, custom colors, sizing, labels all correctly implemented
- ⚠️ Minor omissions: Missed the subtitle margin specification, but 95% accuracy overall
Testing AI with Variations:
Let’s test AI with a simpler prompt to see how it performs:
Simple Prompt: “Create a scatterplot of flipper length vs body mass by species using the penguins dataset in R.”
AI Response (Simple):
ChatGPT 5 gave did very well and gave back the following code. A caveat is that we need to check with a different ChatGPT 5 account.
# Load packages
library(ggplot2)
library(palmerpenguins)
# Create scatterplot
ggplot(data = penguins, aes(x = flipper_length_mm,
y = body_mass_g,
color = species)) +
geom_point(size = 2.5, alpha = 0.8) +
labs(
title = "Penguin Body Mass vs Flipper Length",
x = "Flipper Length (mm)",
y = "Body Mass (g)",
color = "Species"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "top")
And here is the plot generated by using the simple prompt on ChatGPT 5 and it pretty decent.
This demonstrates an important point: AI output quality is directly proportional to prompt specificity.
AI Approach Assessment
Strengths:
- ✅ Incredible speed – seconds vs 15-30 minutes for manual coding
- ✅ High accuracy when prompts are detailed and specific
- ✅ Accessibility – great for beginners who don’t know ggplot2 syntax
- ✅ Rapid prototyping – excellent for quick data exploration
- ✅ Consistency – generates similar code structure repeatedly
Weaknesses:
- ❌ Prompt dependency – quality heavily depends on your ability to write specific prompts
- ❌ Limited learning – doesn’t build your ggplot2 expertise
- ❌ Black box effect – you might not understand the generated code
- ❌ Iteration challenges – harder to make small adjustments without new prompts
Part 3: Hybrid ggplot2 Workflow (The Pragmatist’s Method)
The hybrid approach combines AI’s speed with human expertise for optimal results. This is often the most practical workflow for experienced R users.
Step 1: AI Foundation with Simple Prompt
We start with a basic prompt to get our foundation quickly:
Basic Prompt: “Create a scatterplot of flipper_length_mm vs body_mass_g colored by species using the penguins dataset in R with ggplot2.”
AI Foundation Response:
# Load necessary libraries
library(ggplot2)
library(palmerpenguins)
# Create scatterplot
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
geom_point() +
labs(
title = "Flipper Length vs Body Mass by Species",
x = "Flipper Length (mm)",
y = "Body Mass (g)",
color = "Species"
) +
theme_minimal()
This gives us a clean starting point in seconds, saving us from typing the basic ggplot structure.
Step 2: Human Enhancement and Refinement
Now we can apply our ggplot2 expertise to transform the basic plot into a professional visualization:
library(ggplot2)
library(palmerpenguins)
library(dplyr)
penguins |>
# Our addition: data cleaning for robustness
filter(!is.na(flipper_length_mm), !is.na(body_mass_g)) %>%
# AI foundation: basic plot structure
ggplot(aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
# Our enhancement: better point aesthetics
geom_point(size = 2.5, alpha = 0.8) +
# Our addition: statistical insight with regression lines
geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
# Our expertise: custom color palette for better accessibility
scale_color_manual(
values = c("Adelie" = "#FF6B35",
"Chinstrap" = "#004E89",
"Gentoo" = "#009639"),
guide = guide_legend(override.aes = list(alpha = 1)) # Our touch: solid legend colors
) +
# AI foundation and Our addition: comprehensive, professional labeling and crediting data source
labs(
title = "Penguin Body Mass vs Flipper Length",
subtitle = "Strong positive relationship varies significantly across species",
x = "Flipper Length (mm)",
y = "Body Mass (g)",
color = "Species",
caption = "Data: Palmer Station Antarctica LTER | Visualization: DataVizPyr"
) +
# AI foundation enhanced with our theme expertise
theme_minimal(base_size = 12) +
theme(
# Our refinements: professional typography hierarchy
plot.title = element_text(size = 16, face = "bold", margin = margin(b = 8)),
plot.subtitle = element_text(size = 12, color = "grey40", margin = margin(b = 20)),
# Our touch: legend optimization
legend.position = "top",
legend.title = element_text(size = 11, face = "bold"),
legend.margin = margin(b = 15),
# Our expertise: grid optimization for clarity
panel.grid.minor = element_blank(),
panel.grid.major = element_line(color = "grey90", size = 0.5),
# Our addition: caption styling
plot.caption = element_text(size = 9, color = "grey50", hjust = 0, margin = margin(t = 15))
)
Hybrid Approach Assessment
Strengths:
- ✅ Optimal efficiency – fast foundation (AI) + expert refinement (human)
- ✅ Learning opportunity – you review and understand AI code before enhancing
- ✅ Best of both worlds – speed + complete control over final output
- ✅ Flexible iteration – easy to make adjustments and improvements
- ✅ Maintains expertise – keeps you engaged with ggplot2 syntax and best practices
Weaknesses:
- ❌ Requires intermediate expertise – need to recognize what needs improvement
- ❌ Quality control needed – must evaluate AI output before building on it
- ❌ Workflow complexity – more steps than pure manual or pure AI approaches
Comparative Analysis: Manual vs AI vs Hybrid in R
| Aspect | Manual Approach | AI Approach | Hybrid Approach |
|---|---|---|---|
| Development Time | 15-30 minutes | 1-3 minutes | 5-12 minutes |
| Code Quality | Excellent | Variable | High |
| Learning Value | Maximum | Minimal | Good |
| Customization | Complete | Prompt-dependent | High |
| Skill Required | High ggplot2 expertise | Prompt writing skills | Intermediate R knowledge |
Why This Matters Specifically for R Users
ggplot2’s grammar of graphics translates exceptionally well to AI prompts. The structured, layered approach of ggplot2 (data + aesthetics + geometries + themes) maps naturally to how we describe visualizations in plain English. This makes R + AI combinations particularly powerful compared to other visualization libraries.
R’s pipe operator (%>% or |>) works seamlessly with AI-generated code, making it easy to enhance AI foundations with additional data manipulation and refinement steps.
The R ecosystem’s consistency means AI models trained on R code tend to produce more reliable, idiomatic results compared to the more fragmented Python visualization landscape.
Recommendation Framework: Choose Your R Workflow
🎓 Learning ggplot2
Recommendation: Manual → Hybrid
Start with manual coding to build solid ggplot2 fundamentals. Once comfortable with the grammar of graphics, gradually incorporate hybrid workflows to see expert techniques.
⚡ Exploratory Analysis
Recommendation: Pure AI
When you need quick visualizations for data exploration and presentation quality isn’t critical, AI prompts can generate plots in seconds for rapid insights.
🏆 Production Visualizations
Recommendation: Hybrid
For reports, presentations, and publications, the hybrid approach delivers professional quality efficiently while maintaining your expertise and control.
🎨 Complex Custom Plots
Recommendation: Manual
When you need precise control over advanced ggplot2 features, custom annotations, or unconventional styling, manual coding remains the gold standard.
📊 Team Collaboration
Recommendation: Hybrid
Hybrid workflows create readable, well-documented code that team members can easily understand, modify, and maintain—regardless of their AI familiarity.
🚀 Skill Development
Recommendation: All Three
Use all approaches strategically: Manual for fundamentals, AI for inspiration and speed, Hybrid for practical application. This builds comprehensive modern R skills.
Conclusion: The Future of R Data Visualization
The integration of AI into R workflows represents a significant evolution in how we approach data visualization. Rather than replacing traditional ggplot2 skills, AI tools amplify our capabilities and change how we allocate our time and mental energy.
Key Takeaways:
- Manual coding remains essential for learning ggplot2 fundamentals and handling complex customizations
- AI excels at rapid prototyping and generating boilerplate code, dramatically reducing time-to-insight
- Hybrid workflows offer the best balance for most professional scenarios, combining efficiency with expertise
- The choice depends on context: your experience level, time constraints, and output requirements should guide your approach
Looking Forward:
As AI tools continue to improve, we can expect even better integration with R workflows. The future likely holds AI assistants that understand statistical context, suggest appropriate visualizations based on data types, and even help with interpretation of visual patterns.
The most successful R users in this AI era will be those who master all three approaches and know when to deploy each one strategically. Whether you’re building foundational skills, racing against deadlines, or crafting the perfect publication-quality plot, you now have a complete toolkit for modern R visualization workflows.
What approach will you try first? Share your results and let’s continue advancing the art and science of data visualization together.
Want to stay updated on the latest in AI-assisted data visualization? Subscribe to DataVizPyr for more practical guides, comparisons, and tutorials that help you work smarter with your data.
Build Smarter Boxplots in ggplot2
Discover modern boxplot workflows powered by AI. Learn step-by-step with clear R examples and styling options.
Explore the Complete ggplot2 Guide
35+ tutorials with code: scatterplots, boxplots, themes, annotations, facets, and more—tested and beginner-friendly.
Visit the ggplot2 Hub → No fluff—just code and visuals.
1 comment
Comments are closed.