1 of 28

DATA VISUALIZATIONggplot2 & Grammar of Graphics

2 of 28

CONTENT

  • Credits & References
  • Pleas for Data Visualization
  • The Grammar of Graphics
    • Basic Layers (data, aesthetics, geometry)
    • Advanced Layers (statistics, scales, facets, coordinates, themes)
  • What to plot? Important visualizations for different applications

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

3 of 28

CREDITS

This slide deck is heavily inspired by the workshop “Plotting anything with ggplot2” by Tomas Lin Pedersen:�

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

4 of 28

REFERENCES

Wickham, Hadley. ggplot2. Springer Science + Business Media, LLC, 2016. Online verfügbar: https://ggplot2-book.org/

Wilke, C. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. First edition, O’Reilly Media, 2019.�

Online verfügbar: https://clauswilke.com/dataviz/index.html

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

5 of 28

REFERENCES

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

6 of 28

DATA ANALYTICS PROCESS

DATA VISUALIZATION

Today

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

7 of 28

PLEAS FOR DATA VISUALIZATION

8 of 28

PLEAS FOR DATA VISUALIZATION

  • Find two examples here

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

9 of 28

THE GRAMMAR OF GRAPHICS

BASIC LAYERS

10 of 28

THE GRAMMAR OF GRAPHICS

BASIC LAYERS

  • In the Grammar of Graphics, a visualization consists of a minimum of three layers:
    • Data
    • Mapping of data to aesthetic elements
    • Geometric shapes
  • ggplot2 implements this idea → Visualizations are built as a stack of theses layers

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

11 of 28

THE GRAMMAR OF GRAPHICS

EXAMPLE FOR BASIC LAYERS

ggplot(covid) +

aes(x = date, y = new_cases_smoothed_per_million) +

geom_line()

What is the data?

How to map the data to aesthetics?

Which geometric shape represents our data?

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

12 of 28

THE GRAMMAR OF GRAPHICS

ALL LAYERS

13 of 28

THE GRAMMAR OF GRAPHICS

COMPOSITION CONCEPT

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

Any

data

visualization

=

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

14 of 28

THE GRAMMAR OF GRAPHICS

COMPOSITION CONCEPT

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

Only those 3 are needed! Everything else has a default!

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

15 of 28

THE DATA LAYER

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

  • Data must be provided as a data frame (tibble)
  • Contains only
    • necessary variables
    • relevant rows and
    • the right level of aggregation
    • pre-computed statistics
  • Toolset for data transformation (dplyr)

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

16 of 28

THE MAPPING LAYER

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

  • The aesthetics mapping (aes) links variables in the data to graphics properties
  • Most important: What should be shown on x and y-axis?
  • More mappings:
    • Line color & style
    • Fill color
    • Point size & shape
    • Alpha

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

17 of 28

THE STATISTICS LAYER

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

  • If not pre-computed, statistics can be calculated by the visualization
  • All geometries are assigned a default statistic (and vice versa)
  • Example statistics:
    • identity → The value provided as is
    • count → Count rows
    • bin → Bin continuous variables
    • density → Estimate density
    • Many more…

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

18 of 28

THE STATISTICS LAYER

STATS & GEOMS

tweets %>%

ggplot() +

stat_count(aes(x = screen_name))

tweets %>%

ggplot() +

geom_bar(aes(x = screen_name))

The count statistics uses bars per default

The bar geometry uses the count statistic per default

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

19 of 28

THE SCALES LAYER

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

  • All aesthetics mappings have a scale attached
  • A scale maps values in the data to the x and y-axis, colors or sizes for shapes
  • All scale functions follow the same naming scheme:
    • scale_<aes>_<type>()
  • We use scales mainly for:
    • Color palettes
    • Axis labeling (breaks, formatting)

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

20 of 28

THE GEOMETRY LAYER

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

  • The geometry is central to how the plot visualizes data
  • Depending on the geometry, different aesthetics can or must be mapped
  • We can add more than one geometry to a plot
  • geom_<type>()

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

21 of 28

THE FACETS LAYER

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

  • Create small panels with the same visualizations for different data
  • Panel logic determined by variable in the data
  • Good to avoid overplotting and maintain readability!
  • facet_wrap() vs facet_grid()

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

22 of 28

THE COORDINATES LAYER

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

  • Specify the coordinate system underlying the visualization:
    • Cartesian (default)
    • Polar
  • Allows for changing axis limits (just like scales)
  • coord_flip() is useful to quickly flip x and y

We will rarely use this layer!

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

23 of 28

THE THEME LAYER

THEME

COORDINATES

FACETS

GEOMETRIES

SCALES

STATISTICS

MAPPING

DATA

  • Style the plot
    • Background colors
    • Fonts (axis, titles)
    • Legends
  • There are predefined themes for us to use:
    • theme_bw()
    • theme_light()
    • theme_dark()

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

24 of 28

WHAT TO PLOT?

25 of 28

WHAT TO PLOT?

TRENDS & DEVELOPMENTS

  • x-axis displays the time (usually), the y-axis some value over time:
    • Line chart
    • Area Chart
    • One vs. multiple series
    • Facets

  • Example: Covid19

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

26 of 28

WHAT TO PLOT?

AMOUNTS & PROPORTIONS

  • A geometry’s size (height, width, area) represents values in the data for easy comparison:
    • Bar Chart
      • next to each other
      • stacked
    • Pie chart

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

27 of 28

WHAT TO PLOT?

DISTRIBUTIONS

  • How are observations of a variable distributed?
    • Histogram (one vs. multiple series)
    • Density plot
    • Ridgeline Plots
    • Box plots

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn

28 of 28

WHAT TO PLOT?

ASSOCIATIONS

  • What associations between variables can we find in the data?
    • Point diagram (scatter plot)
    • Trendlines
    • Heat maps

Prof. Dr. Nicolas Meseth | Twitter | Instagram | YouTube | LinkedIn