1 of 80

Week 3: Data Models

Introduction to Data Visualization

W4995.004 Spring 2019

2 of 80

00 Quiz

01 Sparks

02 Categorical, Ordinal, Quantitative

03 Which Viz Type for Which Data Type

04 Deconstruct Examples

3 of 80

01

Sparks Presentations

MIT Sensable Cities Lab

4 of 80

02

Categorical, Ordinal, Quantitative

5 of 80

Visual Encoding

Data Types

Perceptual Properties

Marks & Channels

6 of 80

Visual Encoding

Data Types

Perceptual Properties

Marks & Channels

7 of 80

Data Types

Visual Encoding

Perceptual Properties

Marks & Channels

Lecture 5

Lecture 7

8 of 80

Marks

= Basic graphical element in image

(a.k.a. geom)

Graphics from Munzner. Visualization Analysis and Design (2015)

Point

Line

Area

9 of 80

Channels

= Ways to control appearance of marks

Graphics from Munzner. Visualization Analysis and Design (2015)

Tilt

Color

Shape

Position

Size

10 of 80

Data Types

What does this mean?

14, 2.6, 30, 30, 15, 100001

11 of 80

Data Types

: refers to semantic models of data

Many aspects of vis design are driven by the kind of data you have.

14, 2.6, 30, 30, 15, 100001

Example from Munzner. Visualization Analysis and Design (2015)

12 of 80

Data models are formal descriptions

Math: sets with operations on them

Conceptual models are mental constructions

Include semantics and support reasoning

Example: data vs. conceptual

1D floats vs. temperatures

3D vector of floats vs. spatial location

13 of 80

Example: NYC Daily Temperatures, 2017

Data Model: 21, 28, 47, 55, … (integers)

Conceptual Model: Temperature (°F)

Data Type

Freezing vs. Not-Freezing (C)

Hot, Warm, Cold, Freezing (O)

Temperature Value (Q)

NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search

14 of 80

Categorical

(a.k.a. Nominal or Qualitative)

Fruit (apple, pear, kiwi…)

Cities (NYC, SF, LA…)

15 of 80

Months (Jan, Feb, Mar…)

Sizes (S, M, L, XL…)

Categorical

(a.k.a. Nominal or Qualitative)

Ordinal

Fruit (apple, pear, kiwi…)

Cities (NYC, SF, LA…)

Graphics from Munzner. Visualization Analysis and Design (2015)

16 of 80

Months (Jan, Feb, Mar…)

Sizes (S, M, L, XL…)

Categorical

(a.k.a. Nominal or Qualitative)

Ordinal

Quantitative

Lengths (1”, 2.5”, 5.14”...)

Population

Fruit (apple, pear, kiwi…)

Cities (NYC, SF, LA…)

Graphics from Munzner. Visualization Analysis and Design (2015)

17 of 80

Data Types

C: Categorical (labels or or categories)

Operations: =,

Categories are of equal importance, or “equidistant”

e.g. Eye color: blue, green, dark brown, light brown

Slide via Jeff Heer

18 of 80

Data Types

C: Categorical

Operations: =,

Categories are of equal importance, or “equidistant”

O: Ordered

Operations: =, ≠, <, >

Items of equal importance, or “equidistant”

e.g. Quality of meat: Grade A, AA, AAA

Slide via Jeff Heer

19 of 80

Data Types

C: Categorical

Operations: =,

Categories are of equal importance, or “equidistant”

O: Ordered

Operations: =, ≠, <, >

Items of equal importance, or “equidistant”

Q-Interval (location of zero arbitrary)

Operations: =, ≠, <, >, -

Can measure distances or spans, only delta (i.e. intervals) may be compared

e.g. Dates: Jan, 19, 2006; Location: (LAT 33.98, LONG -118.45)

Slide via Jeff Heer

20 of 80

Data Types

C: Categorical

Operations: =,

Categories are of equal importance, or “equidistant”

O: Ordered

Operations: =, ≠, <, >

Items of equal importance, or “equidistant”

Q-Interval (location of zero arbitrary)

Operations: =, ≠, <, >, -

Can measure distances or spans, only delta (i.e. intervals) may be compared

Q-Ratio (zero fixed)

Operations: =, ≠, <, >, -, %

Can measure ratios or proportions e.g. Length, Mass, Temp, counts and amounts

Slide via Jeff Heer

21 of 80

Exercise

22 of 80

Quantitative

Ordinal

Categorical

23 of 80

xkcd.com/388 via Marti Hearst

?

?

24 of 80

xkcd.com/388 via Marti Hearst

Quantitative

Categorical

Quantitative (uneven intervals)

25 of 80

Dimensions (~ independent variables)

Measures (~ dependent variables)

26 of 80

Dimensions (~ independent variables)

Often discrete variables describing data (C, O)

Categories, dates, binned quantities; row/col labels

Measures (~ dependent variables)

Data values that can be aggregated (Q)

Numbers to be analyzed; mapped to markers

Aggregate as sum, count, avg, std. dev...

Not a strict distinction: the same variable may be treated either way depending on the task.

27 of 80

Example: U.S. Census Data

People Count: # of people in group

Year: 1850–2000 (every decade)

Age: 0–90+

Sex: Male, Female

Example via Jeffrey Heer.

28 of 80

Example: U.S. Census Data

People Count: Measure

Year: Dimension

Age: Depends!

Sex: Dimension

Population by age

vs.

Avg. age by gender

Example via Jeffrey Heer.

29 of 80

02

Example: NYC Daily Temperature

30 of 80

NYC Daily Temperatures, 2017

Data Model: 21, 28, 47, 55, … (integers)

Conceptual Model: Temperature (°F)

Data Type

Freezing vs. Not-Freezing (C)

Hot, Warm, Cold, Freezing (O)

Temperature Value (Q)

NYC Daily Temperature from https://www.ncdc.noaa.gov/cdo-web/search

31 of 80

32 of 80

1D Categorical

1D Ordinal

1D Quantitative

33 of 80

2D: Categorical x Quantitative (COUNT)

Freezing = under 32 degrees

Tableau won’t let you make a line chart here.

34 of 80

2D: Ordinal x Quantitative (COUNT)

Freezing <32

Cold 32–54

Cool 55–74

Warm 75–85

Hot 85+

35 of 80

2D: Quantitative x Quantitative (day)

Scatterplot: doesn’t make sense for a timeseries.

What could make sense?

36 of 80

2D: Quantitative x Quantitative (day)

37 of 80

2D: Ordinal x Ordinal (month)

38 of 80

2D: Ordinal x Ordinal (month)

Color: count of days (Q)

39 of 80

2D: Ordinal x Quantitative (day)

Gantt chart

X-axis: date (Q)

Y-axis: temperature (O)

Color: temperature (O) *repeat

40 of 80

03

Which Viz Type for Which Data Type

41 of 80

Q, O, C?

42 of 80

Q

Q

43 of 80

Q, O, C?

44 of 80

O

Q

Cf. Bar Charts

45 of 80

Taxonomy of vis types

Stolte, et. al. Polaris, ACM 2008.

46 of 80

Tableau UI chart types

47 of 80

What do you see?

Berinato, Good Charts.

48 of 80

Categorical data should not be connected by a line: it misleadingly suggests an ordering.

Berinato, Good Charts.

49 of 80

1D Categorical

Graphics by Jeffery Heer.

50 of 80

1D Categorical

Graphics by Jeffery Heer.

51 of 80

1D Quantitative

52 of 80

1D Quantitative

The relationship you do want to highlight is obscured.

(Encoding lecture: quantitative values best mapped to position to be expressive)

53 of 80

Abela, Advanced Presentations by Design, 2013 redrawn by Berinato in Good Charts

54 of 80

03

Deconstruct

55 of 80

William Playfair (1786)

Image via Wikipedia

Inventor of line charts, bar charts, and pie charts.

56 of 80

X-axis: year (Q)

Y-axis: currency (Q)

Color: imports/exports (C)

57 of 80

Map of the Market (Wattenberg 2000)

58 of 80

Rectangle Area: market cap (Q)

Rectangle Position: market sector (C), market cap (Q)

Color Hue: loss vs. gain (C)

Color Value: magnitude of loss or gain (Q)

59 of 80

Dorling Cartogram

60 of 80

Circle Area: state population (Q)

Color: % obese, binned (O)

X-axis: ~longitude of state centroid (Q)

Y-axis: ~latitude of state centroid (Q)

61 of 80

Compare & Contrast

62 of 80

Activity: Analyze and Re-design visualization

  • Break into groups of 3–4 with neighbors, choose 1 from 4 following slides

  • Identify data variables (C/O/Q) and encodings
  • Redesign another way to visualize the data. What different message does your redesign prioritize? (subset is OK)
  • Sketch your redesign, post to Slack with UNIs

63 of 80

Analyze and Re-design #1: California Wildfires

BuzzFeed Peter Aldhous

64 of 80

Analyze and Re-design #2: Basketball

Flowing Data Nathan Yau

65 of 80

Analyze and Re-design #3: News Lifespan

Schema, Google Trends, Axios

66 of 80

Analyze and Re-design #4: Global Middle Class

Washington Post

67 of 80

Last week’s budget visualizations

https://designsprintkit.withgoogle.com/methods/sketch/crazy-8-sharing-and-voting/

68 of 80

Dataset: Someone’s Monthly Budget

Date

Recreation

Bars/

Restaurants

Groceries

Transport/

Travel

Housing

August

400

400

0

0

0

September

100

200

100

300

1200

October

100

200

100

0

600

November

0

200

100

0

600

69 of 80

Someone’s Budget

70 of 80

Someone’s Budget

71 of 80

Someone’s Budget

72 of 80

Someone’s Budget

73 of 80

Someone’s Budget

74 of 80

Someone’s Budget

Fall 2018 class

75 of 80

Last exercise: data type of zipcode?

76 of 80

Ben Fry, Zipdecode (1999)

77 of 80

Summary: Data Models

  • A dataset measurement can be interpreted as many different semantic data types.

  • Different data types determine the appropriate (set of) visual encodings.
    • Viz tool UIs leverage this to help you.

  • Different visual encodings highlight different underlying data relationships.

78 of 80

Questions?

Announcements & Next Week…

79 of 80

Topics Next Week

  • EDA example walk-through

  • Tableau data/database concepts

  • Grammar of Graphics

80 of 80

Checklist For Next Week

  • Readings (now in Canvas)
    • Stephen Few Analytical Patterns
    • Polaris IEEE paper
    • Bertin Postmortem of an Example

  • Assignment 3.1
    • Form NEW* groups (e.g. Slack “I want to work on Yelp dataset”)
    • Find a dataset, connect to Tableau, come up with 3 hypotheses

  • Office Hours by appointment
    • Thursday 10:30–12:30 Jeevan, Joe’s NoCo
    • Monday 3–5p Conder, DSI in Mudd