Week 3: Data Models
Introduction to Data Visualization
W4995.003 Spring 2024
00 Quiz
01 Sparks
02 Categorical, Ordinal, Quantitative
03 Which Viz Type for Which Data Type
04 Deconstruct Examples
01
Sparks Presentation
Starting next week
02
Categorical, Ordinal, Quantitative
Visual Encoding
Data Types
Perceptual Properties
Marks & Channels
Marks
= Basic graphical element in image
(a.k.a. geom)
Graphics from Munzner. Visualization Analysis and Design (2015)
Point
Line
Area
Channels
= Ways to control appearance of marks
Graphics from Munzner. Visualization Analysis and Design (2015)
Tilt
Color
Shape
Position
Size
Data Types
Visual Encoding
Perceptual Properties
Marks & Channels
Lecture 5
Lecture 7
Visual Encoding
Data Types
Perceptual Properties
Marks & Channels
Data Types
What does this mean?
14, 2.6, 30, 30, 15, 100001
Data Types: depend on semantic models of data
Many aspects of vis design are driven by the kind of data you have.
14, 2.6, 30, 30, 15, 100001
Example from Munzner. Visualization Analysis and Design (2015)
Data types answer the question:
How inherently numerical is the data?
Categorical: not at all
Ordinal: a little
Quantitative: totally
Example from Munzner. Visualization Analysis and Design (2015)
Categorical
(a.k.a. Nominal or Qualitative)
Fruit (apple, pear, kiwi…)
Cities (NYC, SF, LA…)
Months (Jan, Feb, Mar…)
Sizes (S, M, L, XL…)
Categorical
(a.k.a. Nominal or Qualitative)
Ordinal
Fruit (apple, pear, kiwi…)
Cities (NYC, SF, LA…)
Graphics from Munzner. Visualization Analysis and Design (2015)
Months (Jan, Feb, Mar…)
Sizes (S, M, L, XL…)
Categorical
(a.k.a. Nominal or Qualitative)
Ordinal
Quantitative
Lengths (1”, 2.5”, 5.14”...)
Population
Fruit (apple, pear, kiwi…)
Cities (NYC, SF, LA…)
Graphics from Munzner. Visualization Analysis and Design (2015)
Data types are formal descriptions
Math: sets with operations on them
Conceptual models are mental constructions
Include semantics and support reasoning
Example: data vs. conceptual
1D floats vs. temperatures
3D vector of floats vs. spatial location
Data Models
C: Categorical (labels or or categories, a.k.a. nominal)
Operations: =, ≠
Categories are of equal importance, or “equidistant”
e.g. Eye color: blue, green, dark brown, light brown
Slide via Jeff Heer
Data Models
C: Categorical
Operations: =, ≠
Categories are of equal importance, or “equidistant”
O: Ordered
Operations: =, ≠, <, >
Items of equal importance, or “equidistant”
e.g. Quality of meat: Grade A, AA, AAA
Slide via Jeff Heer
Data Models
C: Categorical
Operations: =, ≠
Categories are of equal importance, or “equidistant”
O: Ordered
Operations: =, ≠, <, >
Items of equal importance, or “equidistant”
Q-Interval (location of zero arbitrary)
Operations: =, ≠, <, >, -
Can measure distances or spans, only delta (i.e. intervals) may be compared
e.g. Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45)
Slide via Jeff Heer
Data Models
C: Categorical
Operations: =, ≠
Categories are of equal importance, or “equidistant”
O: Ordered
Operations: =, ≠, <, >
Items of equal importance, or “equidistant”
Q-Interval (location of zero arbitrary)
Operations: =, ≠, <, >, -
Can measure distances or spans, only delta (i.e. intervals) may be compared
Q-Ratio (zero fixed)
Operations: =, ≠, <, >, -, %
Can measure ratios or proportions e.g. Length, Mass, Temp, counts and amounts
Slide via Jeff Heer
Data Types are nested subsets
C ⊂ O ⊂ Q
NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search
Example: NYC Daily Temperatures, 2017
Data Type: 21, 28, 47, 55, … (integers)
Conceptual Model: Temperature (°F)
Data Model
Temperature Value (Q)
Hot, Warm, Cold, Freezing (O)
Freezing vs. Not-Freezing (C)
NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search
Exercise
Quantitative
Ordinal
Categorical
xkcd.com/388 via Marti Hearst
?
?
xkcd.com/388 via Marti Hearst
Quantitative
Categorical
Quantitative (uneven intervals)
02
Dimensions vs. Measures
Dimensions (~ independent variables)
Measures (~ dependent variables)
Dimensions (~ independent variables)
Often discrete, can be used to segment data (C, O)
Categories, dates, binned quantities; row/col labels
Measures (~ dependent variables)
Numbers to be analyzed (Q)
Can be aggregated as sum, count, avg, std. dev...
Not a strict distinction: the same variable may be treated either way depending on the task.
Example: U.S. Census Data
People Count: # of people in group
Year: 1850–2000 (every decade)
Age: 0–90+
Sex: Male, Female
Example via Jeffrey Heer.
Example: U.S. Census Data
People Count: Measure
Year: Dimension
Age: Depends!
Sex: Dimension
Population by age
vs.
Avg. age by gender
Example via Jeffrey Heer.
vs. sales as a dimension→
Note: Pill color in Tableau = continuous vs. discrete
02
Example: NYC Daily Temperature
NYC Daily Temperatures, 2017
Data Type: 21, 28, 47, 55, … (integers)
Conceptual Model: Temperature (°F)
Data Model
Freezing vs. Not-Freezing (C)
Hot, Warm, Cold, Freezing (O)
Temperature Value (Q)
NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search
1D Categorical
1D Ordinal
1D Quantitative
2D: Categorical x Quantitative (COUNT)
Freezing = under 32 degrees
Tableau won’t let you make a line chart here.
2D: Ordinal x Quantitative (COUNT)
Freezing <32
Cold 32–54
Cool 55–74
Warm 75–85
Hot 85+
2D: Quantitative x Quantitative (day)
Scatterplot: doesn’t make sense for a timeseries.
What could make sense?
2D: Quantitative x Quantitative (day)
2D: Ordinal x Ordinal (month)
3D: Ordinal x Ordinal (month) x Quant (count)
Color: count of days (Q binned into O)
2D: Ordinal x Quantitative (day)
Gantt chart
X-axis: date (Q)
Y-axis: temperature (O)
Color: temperature (O) *repeat
Revisit: C ⊂ O ⊂ Q
Data Type: 21, 28, 47, 55, … (integers)
Conceptual Model: Daily Avg. Temperature (°F)
Data Model
Temperature Value (Q)
Hot, Warm, Cold, Freezing (O)
Freezing vs. Not-Freezing (C)
NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search
QxQ
OxO
OxQ
03
Which Viz Type for Which Data Type
Taxonomy of vis types
Stolte, et. al. Polaris, ACM 2008.
Tableau UI chart types
Inappropriate chart types are disabled
What do you see?
Berinato, Good Charts.
Categorical data should not be connected by a line: it misleadingly suggests an ordering.
Berinato, Good Charts.
Example data: “mpg”
Model | Origin | Miles per gallon |
"ford maverick" | USA | 21.0 |
"datsun pl510" | Japan | 27.0 |
"volkswagen 1131 deluxe sedan" | Germany | 26.0 |
... | | |
1D Quantitative
1D Quantitative
The relationship you do want to highlight is obscured.
(Encoding lecture: quantitative values best mapped to position to be expressive)
03
Deconstruct
William Playfair (1786)
Image via Wikipedia
Inventor of line charts, bar charts, and pie charts.
British pounds
X-axis: year (Q)
Y-axis: currency (Q)
Color: imports/exports (C)
Dorling Cartogram
Circle Area: state population (Q)
Color: % obese, binned (O)
X-axis: ~longitude of state centroid (Q)
Y-axis: ~latitude of state centroid (Q)
Compare & Contrast
Compare & Contrast
Left: John Hopkins dashboard, right: Bloomberg
Activity: Analyze and Re-design visualization
Analyze and Re-design #1: California Wildfires
BuzzFeed Peter Aldhous
Analyze and Re-design #2: Basketball
Flowing Data Nathan Yau
Analyze and Re-design #3: Global Middle Class
Washington Post
Analyze and Re-design #4: U.S. Total Tax Rate
NYtimes Opinion
Analyze and Re-design #5: American Job Incomes
Nathan Yau
Last exercise: data type of zip code?
Ben Fry, Zipdecode (1999)
Summary: Data Models
Questions?
Announcements & Next Week…
Topics Next Week
How to Read an Academic Paper
Adapted from Stanford EE384m How to Read A Paper
Checklist For Next Week