Data Visualization
(Slides adapted from Deb Nolan, Sandrine Dudoit, & Fernando Perez)
UC Berkeley Data 100 Summer 2019
Sam Lau
Learning goals:
Announcements
When Submitting Assignments...
(Demo)
Data Visualization Principles
Six Principles Today
Explored via three case studies.
Case 1: Planned Parenthood 2015 Hearing
Full Report available at https://oversight.house.gov/interactivepage/plannedparenthood/ .
Case 1: Planned Parenthood 2015 Hearing
Case 2: Median Weekly Earnings
Case 2: Median Weekly Earnings
Case 3: Cherry Blossom Runners
Case 3: Cherry Blossom Runners
Principles of Scale
Scale
Keep consistent axis scales
Consider Scale of Data
Reveal the Data
Principles of Conditioning
Conditioning
Use Conditioning To Aid Comparison
Use Small Multiples To Aid Comparison
Principles of Perception
Color Choices Matter!
Jet Colormap
Viridis Colormap
Use a Perceptually Uniform Color Map
Use a Perceptually Uniform Color Map
Jet Colormap
Viridis Colormap
Use Color to Highlight Data Type
Use Color to Highlight Data Type
Use Color to Highlight Data Type
Not All Marks Are Good!
Lengths are Easy to Understand
People can easily distinguish two different lengths
E.g. Heights of bars in bar chart
Angles are Hard to Understand
Avoid pie charts!
Angle judgements are inaccurate
In general, underestimate size of larger angle.
Areas are Hard to Understand
Avoid area charts!
Area judgements are inaccurate
In general, underestimate size of larger area
Areas are Hard to Understand
Avoid word clouds!
Hard to tell the “area” taken up by a word
Avoid Jiggling Baseline
Avoid Jiggling Baseline
Avoid Jiggling Baseline
Instead, plot lines themselves
Principles of Transformation
Transforming Data Can Reveal Patterns
Transforming Data Can Reveal Patterns
Transforming Data Can Reveal Patterns
Log of y-values
Linear relationship after log of y-values implies exponential model for original plot
Fit line to log of y-values:
Log of both x and y-values
Fit line to log of x and y-values:
Linear relationship after log of x and y-values implies polynomial model for original plot
Principles of Context
Add Context Directly to Plot
A publication-ready plot needs:
Principles of Smoothing
Apply Smoothing for Large Datasets
A Histogram is a Smoothed Rug Plot
Smoothing Needs Tuning
Kernel Density Estimation (KDE)
Kernel Density Estimation
Intuition:
Kernel Density Estimation
Intuition:
Kernel Density Estimation
Intuition:
Kernel Density Estimation
Gaussian kernel most common (default for seaborn).
Kernel Density Estimation
Changing width of each kernel = changing bandwidth
Narrow bandwidth is analogous to narrow bins for histogram
KDE Example — Uniform Kernel
Uniform kernel with bandwidth of 2.
Data points at:
Kernel at each x:
KDE Example — Uniform Kernel
Scale each kernel by 1/4 since there are four points:
KDE Example — Uniform Kernel
Add kernels together:
Height at 1.5? 0.5
Summary