1 of 74

Week 3: Data Models

Introduction to Data Visualization

W4995.003 Spring 2024

2 of 74

00 Quiz

01 Sparks

02 Categorical, Ordinal, Quantitative

03 Which Viz Type for Which Data Type

04 Deconstruct Examples

3 of 74

01

Sparks Presentation

Starting next week

4 of 74

02

Categorical, Ordinal, Quantitative

5 of 74

Visual Encoding

Data Types

Perceptual Properties

Marks & Channels

6 of 74

Marks

= Basic graphical element in image

(a.k.a. geom)

Graphics from Munzner. Visualization Analysis and Design (2015)

Point

Line

Area

7 of 74

Channels

= Ways to control appearance of marks

Graphics from Munzner. Visualization Analysis and Design (2015)

Tilt

Color

Shape

Position

Size

8 of 74

Data Types

Visual Encoding

Perceptual Properties

Marks & Channels

Lecture 5

Lecture 7

9 of 74

Visual Encoding

Data Types

Perceptual Properties

Marks & Channels

10 of 74

Data Types

What does this mean?

14, 2.6, 30, 30, 15, 100001

11 of 74

Data Types: depend on semantic models of data

Many aspects of vis design are driven by the kind of data you have.

14, 2.6, 30, 30, 15, 100001

Example from Munzner. Visualization Analysis and Design (2015)

12 of 74

Data types answer the question:

How inherently numerical is the data?

Categorical: not at all

Ordinal: a little

Quantitative: totally

Example from Munzner. Visualization Analysis and Design (2015)

13 of 74

Categorical

(a.k.a. Nominal or Qualitative)

Fruit (apple, pear, kiwi…)

Cities (NYC, SF, LA…)

14 of 74

Months (Jan, Feb, Mar…)

Sizes (S, M, L, XL…)

Categorical

(a.k.a. Nominal or Qualitative)

Ordinal

Fruit (apple, pear, kiwi…)

Cities (NYC, SF, LA…)

Graphics from Munzner. Visualization Analysis and Design (2015)

15 of 74

Months (Jan, Feb, Mar…)

Sizes (S, M, L, XL…)

Categorical

(a.k.a. Nominal or Qualitative)

Ordinal

Quantitative

Lengths (1”, 2.5”, 5.14”...)

Population

Fruit (apple, pear, kiwi…)

Cities (NYC, SF, LA…)

Graphics from Munzner. Visualization Analysis and Design (2015)

16 of 74

Data types are formal descriptions

Math: sets with operations on them

Conceptual models are mental constructions

Include semantics and support reasoning

Example: data vs. conceptual

1D floats vs. temperatures

3D vector of floats vs. spatial location

17 of 74

Data Models

C: Categorical (labels or or categories, a.k.a. nominal)

Operations: =,

Categories are of equal importance, or “equidistant”

e.g. Eye color: blue, green, dark brown, light brown

Slide via Jeff Heer

18 of 74

Data Models

C: Categorical

Operations: =,

Categories are of equal importance, or “equidistant”

O: Ordered

Operations: =, ≠, <, >

Items of equal importance, or “equidistant”

e.g. Quality of meat: Grade A, AA, AAA

Slide via Jeff Heer

19 of 74

Data Models

C: Categorical

Operations: =,

Categories are of equal importance, or “equidistant”

O: Ordered

Operations: =, ≠, <, >

Items of equal importance, or “equidistant”

Q-Interval (location of zero arbitrary)

Operations: =, ≠, <, >, -

Can measure distances or spans, only delta (i.e. intervals) may be compared

e.g. Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45)

Slide via Jeff Heer

20 of 74

Data Models

C: Categorical

Operations: =,

Categories are of equal importance, or “equidistant”

O: Ordered

Operations: =, ≠, <, >

Items of equal importance, or “equidistant”

Q-Interval (location of zero arbitrary)

Operations: =, ≠, <, >, -

Can measure distances or spans, only delta (i.e. intervals) may be compared

Q-Ratio (zero fixed)

Operations: =, ≠, <, >, -, %

Can measure ratios or proportions e.g. Length, Mass, Temp, counts and amounts

Slide via Jeff Heer

21 of 74

Data Types are nested subsets

COQ

NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search

22 of 74

Example: NYC Daily Temperatures, 2017

Data Type: 21, 28, 47, 55, … (integers)

Conceptual Model: Temperature (°F)

Data Model

Temperature Value (Q)

Hot, Warm, Cold, Freezing (O)

Freezing vs. Not-Freezing (C)

NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search

23 of 74

Exercise

24 of 74

Quantitative

Ordinal

Categorical

25 of 74

xkcd.com/388 via Marti Hearst

?

?

26 of 74

xkcd.com/388 via Marti Hearst

Quantitative

Categorical

Quantitative (uneven intervals)

27 of 74

02

Dimensions vs. Measures

28 of 74

Dimensions (~ independent variables)

Measures (~ dependent variables)

29 of 74

Dimensions (~ independent variables)

Often discrete, can be used to segment data (C, O)

Categories, dates, binned quantities; row/col labels

Measures (~ dependent variables)

Numbers to be analyzed (Q)

Can be aggregated as sum, count, avg, std. dev...

Not a strict distinction: the same variable may be treated either way depending on the task.

30 of 74

Example: U.S. Census Data

People Count: # of people in group

Year: 1850–2000 (every decade)

Age: 0–90+

Sex: Male, Female

Example via Jeffrey Heer.

31 of 74

Example: U.S. Census Data

People Count: Measure

Year: Dimension

Age: Depends!

Sex: Dimension

Population by age

vs.

Avg. age by gender

Example via Jeffrey Heer.

32 of 74

Dimensions and measures in Tableau

Sales as a

Measure →

Read more.

vs. sales as a dimension→

33 of 74

Note: Pill color in Tableau = continuous vs. discrete

34 of 74

02

Example: NYC Daily Temperature

35 of 74

36 of 74

NYC Daily Temperatures, 2017

Data Type: 21, 28, 47, 55, … (integers)

Conceptual Model: Temperature (°F)

Data Model

Freezing vs. Not-Freezing (C)

Hot, Warm, Cold, Freezing (O)

Temperature Value (Q)

NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search

37 of 74

1D Categorical

1D Ordinal

1D Quantitative

38 of 74

2D: Categorical x Quantitative (COUNT)

Freezing = under 32 degrees

Tableau won’t let you make a line chart here.

39 of 74

2D: Ordinal x Quantitative (COUNT)

Freezing <32

Cold 32–54

Cool 55–74

Warm 75–85

Hot 85+

40 of 74

2D: Quantitative x Quantitative (day)

Scatterplot: doesn’t make sense for a timeseries.

What could make sense?

41 of 74

2D: Quantitative x Quantitative (day)

42 of 74

2D: Ordinal x Ordinal (month)

43 of 74

3D: Ordinal x Ordinal (month) x Quant (count)

Color: count of days (Q binned into O)

44 of 74

2D: Ordinal x Quantitative (day)

Gantt chart

X-axis: date (Q)

Y-axis: temperature (O)

Color: temperature (O) *repeat

45 of 74

Revisit: C ⊂ O ⊂ Q

Data Type: 21, 28, 47, 55, … (integers)

Conceptual Model: Daily Avg. Temperature (°F)

Data Model

Temperature Value (Q)

Hot, Warm, Cold, Freezing (O)

Freezing vs. Not-Freezing (C)

NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search

46 of 74

QxQ

OxO

OxQ

47 of 74

03

Which Viz Type for Which Data Type

48 of 74

Taxonomy of vis types

Stolte, et. al. Polaris, ACM 2008.

49 of 74

Tableau UI chart types

Inappropriate chart types are disabled

50 of 74

What do you see?

Berinato, Good Charts.

51 of 74

Categorical data should not be connected by a line: it misleadingly suggests an ordering.

Berinato, Good Charts.

52 of 74

Example data: “mpg”

Model

Origin

Miles per gallon

"ford maverick"

USA

21.0

"datsun pl510"

Japan

27.0

"volkswagen 1131 deluxe sedan"

Germany

26.0

...

53 of 74

1D Quantitative

54 of 74

1D Quantitative

The relationship you do want to highlight is obscured.

(Encoding lecture: quantitative values best mapped to position to be expressive)

55 of 74

03

Deconstruct

56 of 74

William Playfair (1786)

Image via Wikipedia

Inventor of line charts, bar charts, and pie charts.

British pounds

57 of 74

X-axis: year (Q)

Y-axis: currency (Q)

Color: imports/exports (C)

58 of 74

Dorling Cartogram

59 of 74

Circle Area: state population (Q)

Color: % obese, binned (O)

X-axis: ~longitude of state centroid (Q)

Y-axis: ~latitude of state centroid (Q)

60 of 74

Compare & Contrast

61 of 74

Compare & Contrast

Left: John Hopkins dashboard, right: Bloomberg

62 of 74

Activity: Analyze and Re-design visualization

  • As a group, choose 1 of the following 5 slides.
  • Paste the image into a drawing app, then:
    • Identify data variables (C,O,Q) and encodings.
    • Redesign another way to visualize the data. What different message does your redesign prioritize? (Subset of data is OK.)
    • Post your sketch to #participation on Slack with UNIs for 2 pts.

63 of 74

Analyze and Re-design #1: California Wildfires

BuzzFeed Peter Aldhous

64 of 74

Analyze and Re-design #2: Basketball

Flowing Data Nathan Yau

65 of 74

Analyze and Re-design #3: Global Middle Class

Washington Post

66 of 74

Analyze and Re-design #4: U.S. Total Tax Rate

NYtimes Opinion

67 of 74

Analyze and Re-design #5: American Job Incomes

Nathan Yau

68 of 74

Last exercise: data type of zip code?

69 of 74

Ben Fry, Zipdecode (1999)

70 of 74

Summary: Data Models

  • A dataset measurement can be interpreted as many different semantic data types.

  • Different data types determine the appropriate (set of) visual encodings.
    • Visualization software UIs leverage this to help you.

  • Different visual encodings highlight different underlying data relationships.

71 of 74

Questions?

Announcements & Next Week…

72 of 74

Topics Next Week

  • EDA example walk-through

  • Tableau data/database concepts

73 of 74

How to Read an Academic Paper

  1. First pass: title, abstract, conclusion
    1. Answer: What’s the main contribution? E.g. “XXX built a { faster / more precise / more user-friendly /etc. } system to do ZZZ.”

  • Second pass: abstract, introduction, prior work
    • Answer: What problem did they solve? How did they solve it “better” than everyone else (according to them)? E.g. “XXX built a more YYY system to do ZZZ by using WWW and TTT.”

  • Third pass: implementation, conclusion
    • Answer: How did they build it? How did they verify that their solution was “better”?

74 of 74

Checklist For Next Week

  • Readings (make sure you can access Canvas)
    • Stephen Few Analytical Patterns
    • Polaris IEEE paper
    • Bertin Postmortem of an Example
  • Lab 2 on Observable, allocate more time than L1!
  • Assignment 3.1 in Tableau
    • Form groups, find a dataset. Optional signup for random groups by Sunday
    • connect to Tableau, define 3 hypotheses, create 1 chart, post to slack