Week 3: Data Models
Introduction to Data Visualization
W4995.004 Spring 2019
00 Quiz
01 Sparks
02 Categorical, Ordinal, Quantitative
03 Which Viz Type for Which Data Type
04 Deconstruct Examples
02
Categorical, Ordinal, Quantitative
Visual Encoding
Data Types
Perceptual Properties
Marks & Channels
Visual Encoding
Data Types
Perceptual Properties
Marks & Channels
Data Types
Visual Encoding
Perceptual Properties
Marks & Channels
Lecture 5
Lecture 7
Marks
= Basic graphical element in image
(a.k.a. geom)
Graphics from Munzner. Visualization Analysis and Design (2015)
Point
Line
Area
Channels
= Ways to control appearance of marks
Graphics from Munzner. Visualization Analysis and Design (2015)
Tilt
Color
Shape
Position
Size
Data Types
What does this mean?
14, 2.6, 30, 30, 15, 100001
Data Types
: refers to semantic models of data
Many aspects of vis design are driven by the kind of data you have.
14, 2.6, 30, 30, 15, 100001
Example from Munzner. Visualization Analysis and Design (2015)
Data models are formal descriptions
Math: sets with operations on them
Conceptual models are mental constructions
Include semantics and support reasoning
Example: data vs. conceptual
1D floats vs. temperatures
3D vector of floats vs. spatial location
Example: NYC Daily Temperatures, 2017
Data Model: 21, 28, 47, 55, … (integers)
Conceptual Model: Temperature (°F)
Data Type
Freezing vs. Not-Freezing (C)
Hot, Warm, Cold, Freezing (O)
Temperature Value (Q)
NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search
Categorical
(a.k.a. Nominal or Qualitative)
Fruit (apple, pear, kiwi…)
Cities (NYC, SF, LA…)
Months (Jan, Feb, Mar…)
Sizes (S, M, L, XL…)
Categorical
(a.k.a. Nominal or Qualitative)
Ordinal
Fruit (apple, pear, kiwi…)
Cities (NYC, SF, LA…)
Graphics from Munzner. Visualization Analysis and Design (2015)
Months (Jan, Feb, Mar…)
Sizes (S, M, L, XL…)
Categorical
(a.k.a. Nominal or Qualitative)
Ordinal
Quantitative
Lengths (1”, 2.5”, 5.14”...)
Population
Fruit (apple, pear, kiwi…)
Cities (NYC, SF, LA…)
Graphics from Munzner. Visualization Analysis and Design (2015)
Data Types
C: Categorical (labels or or categories)
Operations: =, ≠
Categories are of equal importance, or “equidistant”
e.g. Eye color: blue, green, dark brown, light brown
Slide via Jeff Heer
Data Types
C: Categorical
Operations: =, ≠
Categories are of equal importance, or “equidistant”
O: Ordered
Operations: =, ≠, <, >
Items of equal importance, or “equidistant”
e.g. Quality of meat: Grade A, AA, AAA
Slide via Jeff Heer
Data Types
C: Categorical
Operations: =, ≠
Categories are of equal importance, or “equidistant”
O: Ordered
Operations: =, ≠, <, >
Items of equal importance, or “equidistant”
Q-Interval (location of zero arbitrary)
Operations: =, ≠, <, >, -
Can measure distances or spans, only delta (i.e. intervals) may be compared
e.g. Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45)
Slide via Jeff Heer
Data Types
C: Categorical
Operations: =, ≠
Categories are of equal importance, or “equidistant”
O: Ordered
Operations: =, ≠, <, >
Items of equal importance, or “equidistant”
Q-Interval (location of zero arbitrary)
Operations: =, ≠, <, >, -
Can measure distances or spans, only delta (i.e. intervals) may be compared
Q-Ratio (zero fixed)
Operations: =, ≠, <, >, -, %
Can measure ratios or proportions e.g. Length, Mass, Temp, counts and amounts
Slide via Jeff Heer
Exercise
Quantitative
Ordinal
Categorical
xkcd.com/388 via Marti Hearst
?
?
xkcd.com/388 via Marti Hearst
Quantitative
Categorical
Quantitative (uneven intervals)
Dimensions (~ independent variables)
Measures (~ dependent variables)
Dimensions (~ independent variables)
Often discrete variables describing data (C, O)
Categories, dates, binned quantities; row/col labels
Measures (~ dependent variables)
Data values that can be aggregated (Q)
Numbers to be analyzed; mapped to markers
Aggregate as sum, count, avg, std. dev...
Not a strict distinction: the same variable may be treated either way depending on the task.
Example: U.S. Census Data
People Count: # of people in group
Year: 1850–2000 (every decade)
Age: 0–90+
Sex: Male, Female
Example via Jeffrey Heer.
Example: U.S. Census Data
People Count: Measure
Year: Dimension
Age: Depends!
Sex: Dimension
Population by age
vs.
Avg. age by gender
Example via Jeffrey Heer.
02
Example: NYC Daily Temperature
NYC Daily Temperatures, 2017
Data Model: 21, 28, 47, 55, … (integers)
Conceptual Model: Temperature (°F)
Data Type
Freezing vs. Not-Freezing (C)
Hot, Warm, Cold, Freezing (O)
Temperature Value (Q)
NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search
1D Categorical
1D Ordinal
1D Quantitative
2D: Categorical x Quantitative (COUNT)
Freezing = under 32 degrees
Tableau won’t let you make a line chart here.
2D: Ordinal x Quantitative (COUNT)
Freezing <32
Cold 32–54
Cool 55–74
Warm 75–85
Hot 85+
2D: Quantitative x Quantitative (day)
Scatterplot: doesn’t make sense for a timeseries.
What could make sense?
2D: Quantitative x Quantitative (day)
2D: Ordinal x Ordinal (month)
2D: Ordinal x Ordinal (month)
Color: count of days (Q)
2D: Ordinal x Quantitative (day)
Gantt chart
X-axis: date (Q)
Y-axis: temperature (O)
Color: temperature (O) *repeat
03
Which Viz Type for Which Data Type
Q, O, C?
Q
Q
Q, O, C?
O
Q
Cf. Bar Charts
Taxonomy of vis types
Stolte, et. al. Polaris, ACM 2008.
Tableau UI chart types
What do you see?
Berinato, Good Charts.
Categorical data should not be connected by a line: it misleadingly suggests an ordering.
Berinato, Good Charts.
1D Categorical
Graphics by Jeffery Heer.
1D Categorical
Graphics by Jeffery Heer.
1D Quantitative
1D Quantitative
The relationship you do want to highlight is obscured.
(Encoding lecture: quantitative values best mapped to position to be expressive)
Abela, Advanced Presentations by Design, 2013 redrawn by Berinato in Good Charts
03
Deconstruct
William Playfair (1786)
Image via Wikipedia
Inventor of line charts, bar charts, and pie charts.
X-axis: year (Q)
Y-axis: currency (Q)
Color: imports/exports (C)
Map of the Market (Wattenberg 2000)
Rectangle Area: market cap (Q)
Rectangle Position: market sector (C), market cap (Q)
Color Hue: loss vs. gain (C)
Color Value: magnitude of loss or gain (Q)
Dorling Cartogram
Circle Area: state population (Q)
Color: % obese, binned (O)
X-axis: ~longitude of state centroid (Q)
Y-axis: ~latitude of state centroid (Q)
Compare & Contrast
Activity: Analyze and Re-design visualization
Analyze and Re-design #1: California Wildfires
BuzzFeed Peter Aldhous
Analyze and Re-design #2: Basketball
Flowing Data Nathan Yau
Analyze and Re-design #3: News Lifespan
Schema, Google Trends, Axios
Analyze and Re-design #4: Global Middle Class
Washington Post
Last week’s budget visualizations
https://designsprintkit.withgoogle.com/methods/sketch/crazy-8-sharing-and-voting/
Dataset: Someone’s Monthly Budget
Date | Recreation | Bars/ Restaurants | Groceries | Transport/ Travel | Housing |
August | 400 | 400 | 0 | 0 | 0 |
September | 100 | 200 | 100 | 300 | 1200 |
October | 100 | 200 | 100 | 0 | 600 |
November | 0 | 200 | 100 | 0 | 600 |
Someone’s Budget
Someone’s Budget
Someone’s Budget
Someone’s Budget
Someone’s Budget
Someone’s Budget
Fall 2018 class
Last exercise: data type of zipcode?
Ben Fry, Zipdecode (1999)
Summary: Data Models
Questions?
Announcements & Next Week…
Topics Next Week
Checklist For Next Week