From Basics to Advanced: ggplot2 Techniques Every R User Should Know

Customizing ggplot2: Themes, Scales, and Geoms Explainedggplot2 is the de facto standard for data visualization in R. Its grammar-of-graphics approach makes it both powerful and flexible: you compose plots by combining data, aesthetic mappings, geometric objects (geoms), statistical transformations, scales, coordinate systems, and themes. This article explores three central customization areas—geoms, scales, and themes—so you can build clear, attractive, and publication-ready visuals.


Overview: how the layers fit together

A typical ggplot2 plot is built by starting with ggplot(data, aes(…)) and then adding layers with +. A minimal example:

library(ggplot2) ggplot(mtcars, aes(x = wt, y = mpg)) +   geom_point() 
  • Geoms draw the data (points, lines, bars, etc.).
  • Aesthetics (aes) map variables to visual properties (x, y, color, size, shape).
  • Scales control how data values map to aesthetic values and their legends/labels.
  • Themes control non-data ink: background, grid lines, fonts, margins.

We’ll dive into each area with practical examples and tips.


Geoms: choosing and customizing the right geometric object

Geoms are the visible marks that represent data. Choosing the right geom clarifies the message; customizing it improves readability.

Common geoms and their uses:

  • geom_point(): scatterplots, detect relationships or outliers.
  • geom_line(): time series or ordered observations.
  • geom_bar()/geom_col(): counts or pre-aggregated values.
  • geom_histogram(): distribution of a single variable.
  • geom_boxplot(): distribution summaries and outliers.
  • geom_smooth(): trend lines and confidence intervals.
  • geom_violin(): distribution shape combined with density.

Practical tips:

  • For overplotted points, use alpha, size, or geom_jitter().
  • Use geom_col() when you provide pre-summarized heights; geom_bar() when counting.
  • Combine geoms (e.g., geom_point() + geom_smooth()) to show raw data and trends.

Example: layered plot with transparency and grouping

ggplot(diamonds, aes(x = carat, y = price, color = cut)) +   geom_point(alpha = 0.4, size = 1.5) +   geom_smooth(aes(group = cut), method = "loess", se = FALSE) 

Mapping vs setting:

  • Mapping inside aes() links an aesthetic to data (color = cut).
  • Setting outside aes() fixes a value for the layer (color = “blue”).

Aesthetic mapping example:

ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +   geom_boxplot() 

Custom geoms from extensions:

  • The ggplot2 ecosystem has many extension packages (ggbeeswarm, gghighlight, gghalves, ggforce) that provide specialized geoms.

Scales: controlling how data values translate to visual properties

Scales connect data and aesthetics: they determine colors, sizes, axis breaks, labels, and legend behavior.

Scale types:

  • Continuous scales: scale_x_continuous(), scale_y_continuous(), scale_color_gradient().
  • Discrete scales: scale_color_manual(), scale_fill_brewer(), scale_shape_manual().
  • Date/time scales: scale_x_date(), scale_x_datetime().
  • Position scales: scale_x_log10(), scale_y_reverse().

Color scales:

  • Use perceptually uniform palettes for continuous data (viridis, scale_color_viridis_c()).
  • For categorical data, choose palettes with sufficient contrast (RColorBrewer, scale_color_brewer()).
  • Manual scales allow exact color choices: scale_color_manual(values = c(“red”,“blue”)).

Example: custom color and axis breaks

library(viridis) ggplot(mtcars, aes(x = wt, y = mpg, color = hp)) +   geom_point(size = 3) +   scale_color_viridis_c(option = "plasma", name = "Horsepower") +   scale_x_continuous(name = "Weight (1000 lbs)", breaks = c(1.5,2.5,3.5,4.5)) 

Legends and guides:

  • Use guides() and guide_legend()/guide_colorbar() to control legend appearance.
  • Use guide_legend(nrow = 1) for horizontal legends; guide_colorbar(barwidth, barheight) for colorbars.

Transformations and coordinate scales:

  • Use scale_y_log10() or coord_trans(y = “log10”) for log scales.
  • coord_flip() swaps x and y—useful for horizontal bar charts.
  • coord_polar() for circular plots, though often misleading for precise comparison.

Axis labels and formatting:

  • Use scales::label_number(), label_percent(), label_comma(), label_date() for readable axis labels.

Example: percent labels

library(scales) ggplot(df, aes(x = group, y = proportion, fill = group)) +   geom_col() +   scale_y_continuous(labels = label_percent(accuracy = 0.1)) 

Themes: polishing non-data elements for clarity and style

Themes adjust background, grid lines, text, axis ticks, legend placement, and margins—elements that don’t contain data but shape perception.

Built-in themes:

  • theme_gray() (default), theme_minimal(), theme_classic(), theme_bw(), theme_light(), theme_void().
  • Use theme_minimal() or theme_bw() as good starting points for publication-ready plots.

Key theme components:

  • axis.text, axis.title, legend.position, panel.grid, plot.title, plot.subtitle, plot.caption, strip.text (for facets).

Example: customizing a theme

p <- ggplot(mpg, aes(x = displ, y = hwy, color = class)) +   geom_point() +   labs(title = "Engine size vs highway MPG", x = "Displacement (L)", y = "Highway MPG") p + theme_minimal(base_size = 12) +   theme(     plot.title = element_text(face = "bold", size = 14),     legend.position = "bottom",     panel.grid.major = element_line(color = "grey90"),     panel.grid.minor = element_blank()   ) 

Creating reusable themes:

  • Encapsulate settings in a function for consistent styling across plots:
theme_my <- function(base_size = 12) {   theme_minimal(base_size = base_size) +     theme(       plot.title = element_text(face = "bold", size = base_size * 1.2),       legend.position = "bottom",       panel.grid.minor = element_blank()     ) } # Use: p + theme_my() 

Working with fonts:

  • Use showtext or sysfonts packages to load custom fonts.
  • Example with extrafont or showtext to use Google Fonts for consistent typography.

Facets and strip styling:

  • Use facet_wrap() and facet_grid() for small multiples.
  • Customize strip.text and strip.background to make facet labels readable.

Example:

ggplot(diamonds, aes(carat, price)) +   geom_point(alpha = 0.2) +   facet_wrap(~ cut) +   theme(     strip.text = element_text(face = "bold", size = 10),     strip.background = element_rect(fill = "grey95", color = NA)   ) 

Putting it all together: an annotated example

Below is a fuller example that combines geoms, scales, and a custom theme to produce a clear, publication-ready plot.

library(ggplot2) library(viridis) library(scales) theme_clean <- function(base_size = 12) {   theme_minimal(base_size = base_size) +     theme(       plot.title = element_text(face = "bold", size = base_size * 1.3),       plot.subtitle = element_text(size = base_size),       legend.position = "bottom",       legend.title = element_text(face = "bold"),       panel.grid.major = element_line(color = "grey92"),       panel.grid.minor = element_blank(),       axis.title = element_text(face = "bold")     ) } ggplot(mtcars, aes(x = wt, y = mpg, color = hp)) +   geom_point(aes(size = qsec), alpha = 0.8) +   geom_smooth(method = "lm", se = TRUE, color = "black", linetype = "dashed") +   scale_color_viridis_c(option = "magma", name = "Horsepower") +   scale_size_continuous(name = "1/4 mile time (s)", range = c(1,6)) +   scale_x_continuous(name = "Weight (1000 lbs)") +   labs(     title = "Car weight vs fuel efficiency",     subtitle = "Point size = 1/4 mile time; color = horsepower",     caption = "Source: mtcars"   ) +   theme_clean(12) 

Common pitfalls and quick fixes

  • Overplotting: reduce alpha, use smaller points, or geom_jitter/geom_hex.
  • Misleading color scales: avoid rainbow palettes for continuous data; prefer viridis or perceptually uniform scales.
  • Crowded legends: combine guides, reduce keys, or place legend at bottom with multiple columns.
  • Axis labels too dense: adjust breaks or rotate text with theme(axis.text.x = element_text(angle = 45, hjust = 1)).
  • Too many facets: consider small-multiples ordering or filtering to key groups.

Practical checklist before exporting

  • Check axis labels and units are present and clear.
  • Ensure color palette is readable for colorblind viewers (use viridis or ColorBrewer qualitative palettes).
  • Use consistent font sizing and line widths across figures.
  • Remove unnecessary grid lines and background clutter.
  • Export at appropriate resolution and size for target medium (e.g., 300 dpi for print, specific pixel dimensions for web).

Customizing ggplot2 by mastering geoms, scales, and themes will let you communicate data accurately and attractively. Start with sensible defaults, then iterate—small adjustments to scales and theme elements often produce the largest improvements in clarity.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *