1
Applied Data Analysis (CS401)
Robert West
Lecture 6
Data Visualization
Announcements
2
Uses for data visualization
Support reasoning about information (analysis)
Inform and persuade others (communication)
3
Old-fashioned viz
Great for data exploration, developed throughout the last few centuries…
Interactive viz
More and more common when delivering the results. New frameworks are the key enabler.
4
Want to learn more?
Dedicated course:
5
Today’s lecture
6
Visualization for data exploration
7
Histograms
Histograms can tell you a lot about a single variable, discrete or continuous
8
Histograms
Skewed distributions
9
Box plots
Heavy-tailed data
11
Heavy-tailed data: power laws
Heavy-tailed data: power laws
CCDF
Heavy-tailed data: power laws
CCDF
Multimodal data
15
Multimodal data
Explore further by using, e.g., color and a histogram of multiple populations
16
Weird data
17
Proactive “weird-data detection”
If data looks ok, take a picture and save it for later…
Then periodically compare new data with old whenever there is a pipeline update.
Always try to have a theory of what the data should look like.
18
Remarks on exploration
19
Principles of data visualization
20
Visualization definitions
[McCormick et al. 1987]
our natural means of perception.” [Bertin 1967]
representations of data to amplify cognition.”
[Card, Mackinlay, & Shneiderman 1999]
21
Edward Tufte
22
Tufte’s Rules
23
Perception of magnitudes
24
Which is brighter?
(128, 128, 128)
(144, 144, 144)
Just Noticeable Difference
25
26
Compare area of circles
27
Compare area of circles
Perception of magnitudes
Most accurate Position
Length
Slope
Angle
Area
Volume
Least accurate Color hue-saturation-density
28
Cleveland, McGill (1984)
Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods
Use colors wisely
Choose colors based on the information you want to convey
Use online resources to discover and record your color schemes
29
Use colors wisely
30
Use colors wisely
31
Use colors wisely
32
33
Colorblind-friendly pallettes
34
Use structure
Gestalt psychology principles (1912)
35
Use structure� (but not like this!)
36
Less is more
37
Interactive chart design: simplifying
38
Navigating the chart landscape
39
Chart selection (by Andrew Abela)
40
One variable: histograms, box plots
41
Two variables: scatter plots
Scatter plots quickly expose the relationships between two variables
42
> 2 variables: scatter plot matrix
43
> 2 variables: stacked plots
Stack index, color, height
Stack variable and color variable: categorical
44
> 2 variables: parallel-coord. plots
Color, x, y
Color variable is categorical, others arbitrary
45
> 2 variables: radar charts
46
Dimensionality reduction
47
One Dataset, visualized 25 ways
http://flowingdata.com/2017/01/24/one-dataset-visualized-25-ways/
“You must help the data focus and get to the point. Otherwise, it just ends up rambling about what it had for breakfast this morning and how the coffee wasn’t hot enough.”
48
Good examples
49
Charles Joseph Minard 1869�Napoleon’s march
50
According to Tufte: “It may well be the best statistical graphic ever drawn.”
5 variables: army size, location, dates, direction, temperature during retreat
Interactivity to educate
Hans Rosling:
200 Countries, 200 Years, 4 Minutes
https://www.youtube.com/watch?feature=player_embedded&v=jbkSRLYSojo
51
Examples: public Information
Map-based visualizations, such as CrimeMapping
52
The future of journalism?
NY Times interactive visualizations (recession/recovery 2014)�http://www.nytimes.com/interactive/2014/06/05/upshot/how-the-recession-reshaped-the-economy-in-255-charts.html
And 2014 “the year in interactive storytelling”
http://www.nytimes.com/interactive/2014/12/29/us/year-in-interactive-storytelling.html?_r=0
NY Times graphics are a great source of best practices in viz�(except for when they’re not…)
53
Bad examples
Courtesy of viz.wtf
54
Visualization to educate?
55
Pie in the sky?
56
57
Needs fixing
58
Data viz in the sciences
59
Uses for Data Viz
60
A case for ugly visualizations
People instinctively gravitate to attractive visualizations, and they have a better chance of getting on the cover of a journal.
But does this conflict with the goals of visualization?
61
Tools
62
Interactive toolkits: D3
Without doubt, the most widely used interactive visualization framework is D3, developed around 2011 by Jeff Heer, Mike Bostock, and Vadim Ogievetsky.
Note from the authors: D3 is intentionally a low-level system. During the early design of D3, we even referred to it as a "visualization kernel" rather than a "toolkit" or "framework"
63
Interactive toolkits: Vega
Vega is a “visualization grammar” developed on top of D3.js
It specifies graphics in JSON format.
64
Interactive toolkits: Vincent
Vincent is a Python-to-Vega translator.
Trivia question: why is it called Vincent? Hint: Vincent+Vega= ?
65
Interactive toolkits: Vincent
Vincent is a Python-to-Vega translator.
Trivia question: why is it called Vincent? Hint: Vincent+Vega= ?
66
Bokeh: another interactive viz library
Bokeh is an independent Viz library focused more heavily on big data visualization. Has both Python and Scala bindings.
67
Visualizing maps: Folium
More in tomorrow’s lab session!
68
Credits
69