1 of 115

Week 5: Visual Encoding

Introduction to Data Visualization

W4995.003 Spring 2025

2 of 115

00 Quiz

01 Sparks

02 Design/Redesign Exercise Reprise

03 Readings

04 Mark & Channels

05 Expressiveness & Effectiveness

3 of 115

Quiz

5 min

Closed book

4 of 115

01

Sparks

FiveThirtyEight

5 of 115

02

Redesigns from Two Weeks Ago

6 of 115

Analyze and Re-design #1: California Wildfires

BuzzFeed Peter Aldhous

7 of 115

8 of 115

Analyze and Re-design #2: Basketball

Flowing Data Nathan Yau

9 of 115

Analyze and Re-design #3: Global Middle Class

Washington Post

10 of 115

11 of 115

Analyze and Re-design #4: U.S. Total Tax Rate

NYtimes Opinion

12 of 115

13 of 115

Analyze and Re-design #5: American Job Incomes

Nathan Yau

14 of 115

15 of 115

16 of 115

03

Readings

17 of 115

18 of 115

Visible change: +783%

Raw: +9.5 mpg

Proportional: +52%

Size of effect in data

Size of effect in graphic

19 of 115

18

27.5

Visible change: +52%

Visible change: +783%

Raw: +9.5 mpg

Proportional: +52%

Size of effect in data

Size of effect in graphic

20 of 115

21 of 115

22 of 115

23 of 115

24 of 115

25 of 115

26 of 115

27 of 115

Visual language: Nouns

28 of 115

Visual language: Adjectives

29 of 115

Visual language: Adjectives

30 of 115

The Creative Process™

31 of 115

Color: A brief history

32 of 115

4535 Time Magazine Covers, 1923-2009

33 of 115

The Top Grossing Film of All Time, 1 × 1

by Jason Salavon (2000)

34 of 115

04

Marks & Channels

35 of 115

Data Types

Visual Encoding

Perceptual Properties

Marks & Channels

Today!

Lecture 7

36 of 115

Marks

Basic graphical elements in image

(a.k.a. forms & geometry)

Graphics from Munzner. Visualization Analysis and Design (2015)

Point

Line

Area

37 of 115

Marks

Basic graphical elements in image

(a.k.a. forms & geometry)

Graphics from Munzner. Visualization Analysis and Design (2015)

Point

Line

Area

0D

1D

2D

38 of 115

Channels

Ways to vary the appearance of marks

Graphics from Munzner. Visualization Analysis and Design (2015)

Tilt

Color

Shape

Position

Size

39 of 115

Example

Munzner. Visualization Analysis and Design (2015)

Q

Q

Mark: point

Q

Q

Mark: point

Channels:

Color: C

Q

Q

Mark: point

Channels:

Color: C

Size: Q

?

?

Mark: ?

40 of 115

Example

Munzner. Visualization Analysis and Design (2015)

Q

C

Mark: line

Q

Q

Mark: point

Q

Q

Mark: point

Channels:

Color: C

Q

Q

Mark: point

Channels:

Color: C

Size: Q

41 of 115

Example

Munzner. Visualization Analysis and Design (2015)

Q

C

Mark: line

?

?

Mark: ?

Q

Q

Mark: point

Channels:

Color: C

Q

Q

Mark: point

Channels:

Color: C

Size: Q

42 of 115

Example

Munzner. Visualization Analysis and Design (2015)

Q

C

Mark: line

Q

Q

Mark: point

Q

Q

Mark: point

Channels:

Color: C

Q

Q

Mark: point

Channels:

Color: C

Size: Q

43 of 115

Example

Munzner. Visualization Analysis and Design (2015)

Q

C

Mark: line

Q

Q

Mark: point

?

?

Mark: ?

Channels: ?

Q

Q

Mark: point

Channels:

Color: C

Size: Q

44 of 115

Example

Munzner. Visualization Analysis and Design (2015)

Q

C

Mark: line

Q

Q

Mark: point

Q

Q

Mark: point

Channels:

Color: C

Q

Q

Mark: point

Channels:

Color: C

Size: Q

45 of 115

Example

Munzner. Visualization Analysis and Design (2015)

Q

C

Mark: line

Q

Q

Mark: point

Q

Q

Mark: point

Channels:

Color: C

Q

Q

Mark: point

Channels: ?

46 of 115

Example

Munzner. Visualization Analysis and Design (2015)

Q

C

Mark: line

Q

Q

Mark: point

Q

Q

Mark: point

Channels:

Color: C

Q

Q

Mark: point

Channels:

Color: C

Size: Q

47 of 115

Note: Area as Mark vs. Area as Channel

# House Representatives

# House Representatives

State GDP

48 of 115

Area marks should not be size- or shape-encoded

“States” as marks (by shape) already have an implied and accepted size, so it’s difficult to tell if they’re being enlarged or shrunk (cf. dots).

# House Representatives

# House Representatives

State GDP

49 of 115

Area marks should not be size- or shape-encoded

Be wary of scaled pictograms for similar reasons.

Lie factor:

Are we comparing height or area here?

50 of 115

Area marks should not be size- or shape-encoded

Tufte: Don’t use two visual dimensions to represent a single data dimension

51 of 115

Treemap: Area already used as a channel (mkt. cap), so size/shape cannot be additionally encoded

52 of 115

Exception: value-by-area maps

“Country” as a mark has an accepted size, especially in a map context.

Encoding size to another value highlights the difference from your expectation.

Via NYT 2013

53 of 115

Mark Area

Channels

Country ~ Shape

GDP (Q) ~ Size

GDP Growth(O) ~ Color

Center Longitude ~ Pos. X

Center Latitude ~ Pos. Y

Via NYT 2013

54 of 115

Deconstruct: Ebb and Flow of... Box Office Receipts

Via NYT 2008

55 of 115

Ebb and Flow of... Box Office Receipts

Mark Area

Channels

Time (Week) ~ Position X

Weekly Revenue ~ Height/Length along Y

Gross Box Office ~ Color

Gross Box Office ~ Area

56 of 115

Visual Encoding = Mapping data to visual variables

Assign data fields (Q, O, C) to visual channels (x, y, color, size, etc.) for a graphical mark (point, bar, line, etc.)

But the combinatorial space is so large, how do you choose?

57 of 115

Visual Encoding = Mapping data to visual variables

Assign data fields (Q, O, C) to visual channels (x, y, color, size, etc.) for a graphical mark (point, bar, line, etc.)

To maximize expressiveness and effectiveness.

58 of 115

05

Expressiveness & Effectiveness

59 of 115

Expressiveness (MacKinlay 1986)

A set of facts is expressible in a visual language if

the sentences (i.e. the visualizations) in the

language express all the facts in the set of data,

and only the facts in the data.

60 of 115

Example: Iris database

61 of 115

Fails to express all the facts

MacKinlay 1986 Automating the Design of Graphical Presentations of Relational Information

62 of 115

Expresses facts not in the data

MacKinlay 1986 Automating the Design of Graphical Presentations of Relational Information

63 of 115

Expresses facts not in the data

MacKinlay 1986 Automating the Design of Graphical Presentations of Relational Information

64 of 115

Expresses facts not in the data

MacKinlay 1986 Automating the Design of Graphical Presentations of Relational Information

65 of 115

Expresses facts not in the data

MacKinlay 1986 Automating the Design of Graphical Presentations of Relational Information

66 of 115

Expresses facts not in the data

MacKinlay 1986 Automating the Design of Graphical Presentations of Relational Information

67 of 115

Effectiveness (MacKinlay 1986)

One visualization is more effective than another

if the information conveyed is more readily perceived than the information in the other visualization.

68 of 115

In other words:

Expressiveness

Tell the truth, the whole truth, and nothing but the truth (i.e., don’t lie, and don’t lie by omission)

Effectiveness

Use encodings that people can decode more

quickly, accurately, and easily

Via Jeffery Heer

69 of 115

Remember this from Week 1?

70 of 115

Compare length of bars

Via Jeffrey Heer

71 of 115

We perceive length much more precisely than area

Via Jeffrey Heer

72 of 115

Which is larger, A or C?

73 of 115

Which is larger, A or C?

74 of 115

Cleveland McGill: Ranking Accuracy, 1984

  1. Which is larger, A or C?
  2. What percentage is C of A?

Cleveland and McGill, 1984 Journal of the American Statistical Association

75 of 115

Cleveland McGill: Ranking Accuracy, 1984

Graphics from Munzner, redrawn from Cleveland and McGill, 1984.

Unframed

Unaligned

Framed

Unaligned

Framed

Aligned

76 of 115

Cleveland McGill: Ranking Accuracy, 1984

Graphics from Munzner, redrawn from Cleveland and McGill, 1984.

Unframed

Unaligned

Framed

Unaligned

Framed

Aligned

77 of 115

Cleveland McGill: Ranking Accuracy, 1984

Graphics from Munzner, redrawn from Cleveland and McGill, 1984.

Unframed

Unaligned

Framed

Unaligned

Framed

Aligned

78 of 115

History: Bertin “Retinal Variables”, 1967...

79 of 115

History: Bertin “Retinal Variables”, 1967...

Redrawn by Mike Bostock https://medium.com/@mbostock/introducing-d3-scale-61980c51545f

80 of 115

...Cleveland McGill: Accuracy for Q. Perception, 1984

Cleveland and McGill, 1984 Journal of the American Statistical Association

81 of 115

Bostock & Heer’s replication in 2010

Healy, Data Visualization: A practical introduction

82 of 115

Today: via Munzner

Munzner. Visualization Analysis and Design (2015)

83 of 115

But: Channels × Data Type combos...

Graphics from Munzner. Visualization Analysis and Design (2015)

Tilt

Color

Shape

Position

Size

Ordinal

Categorical

Quantitative

×

84 of 115

Exercise: Color × Quantitative?

Categorical

Graphics from Munzner. Visualization Analysis and Design (2015)

Tilt

Color

Shape

Position

Size

Ordinal

Quantitative

×

85 of 115

86 of 115

Exercise: Color × Categorical?

Graphics from Munzner. Visualization Analysis and Design (2015)

Tilt

Color

Shape

Position

Size

Ordinal

Categorical

Quantitative

×

87 of 115

88 of 115

Exercise: Tilt × Categorical?

Graphics from Munzner. Visualization Analysis and Design (2015)

Tilt

Color

Shape

Position

Size

Ordinal

Categorical

Quantitative

×

89 of 115

Exercise: Tilt × Categorical?

90 of 115

Not all Channels work (equally) for all Data Types

Graphics from Munzner. Visualization Analysis and Design (2015)

Tilt

Color

Shape

Position

Size

Ordinal

Categorical

Quantitative

×

91 of 115

Not all Channels work (equally) for all Data Types

Quantitative

Graphics from Munzner. Visualization Analysis and Design (2015)

Tilt

Color

Shape

Position

Size

Ordinal

Categorical

×

92 of 115

Moritz Stefaner. Project Ukko

93 of 115

MacKinlay Effectiveness Rankings, 1986

Quantitative Ordinal Categorical

Position Position Position

Length Density (Value) Color Hue

Angle Color Sat Texture

Slope Color Hue Connection

Area (Size) Texture Containment

Volume Connection Density (Value)

Density (Value) Containment Color Sat

Color Sat Length Shape

Color Hue Angle Length

Texture Slope Angle

Connection Area (Size) Slope

Containment Volume Area

Shape Shape Volume

MacKinlay 1986

94 of 115

Munzner. Visualization Analysis and Design (2015)

95 of 115

Prioritize: What’s the most important thing you want to say? Map that to the most effective encoding.

96 of 115

Summary

Choose the mark and channels that maximize effectiveness of your data story.

Munzner. Visualization Analysis and Design (2015)

97 of 115

03.1

Deconstruct: Napoleon’s March

Via Roger Peng, John Hopkins Biostatistics

98 of 115

Minard, 1869.

99 of 115

100 of 115

What are the data components?

101 of 115

What are the data components?

Longitude (Q) ~ Position X

Latitude (Q) ~ Position Y

Army size (Q) ~ Width of Line

Direction (C) ~ Color

Landmarks (C) ~ Labels (cities, rivers)

102 of 115

What are the data components?

Longitude (Q) / Time Reversed (O) ~ Position X

Temperature (Q) ~ Position Y

103 of 115

Dataset: Temperature

lon temp date

37.6 0 18 Oct 1812

36.0 0 24 Oct 1812

33.2 -9 09 Nov 1812

32.0 -21 14 Nov 1812

29.2 -11 24 Nov 1812

28.5 -20 28 Nov 1812

27.2 -24 01 Dec 1812

26.7 -30 06 Dec 1812

25.3 -26 07 Dec 1812

104 of 115

Dataset: Army

lon lat size dir grp

24.0 54.9 340000 1 1

24.5 55.0 340000 1 1

25.5 54.6 340000 1 1

26.0 54.7 320000 1 1

37.65 55.65 100000 -1 1

37.45 55.62 98000 -1 1

37.0 55.0 97000 -1 1

36.8 55.0 96000 -1 1

24.0 55.1 60000 1 2

24.5 55.2 60000 1 2

25.5 54.7 60000 1 2

105 of 115

Dataset: Cities

lon lat city

24.0 55.0 Kowno

25.3 54.7 Wilna

26.4 54.4 Smorgoni

26.8 54.3 Molodexno

27.7 55.2 Gloubokoe

27.6 53.9 Minsk

28.5 54.3 Studienska

28.7 55.5 Polotzk

29.2 54.4 Bobr

30.2 55.3 Witebsk

30.4 54.5 Orscha

30.4 53.9 Mohilow

32.0 54.8 Smolensk

106 of 115

Single-axis composition along longitude

Longitude (Q)

Latitude (Q)

Army size (Q)

Direction (C)

Landmarks (C)

Temperature (Q)

Longitude (Q) / Time (O)

107 of 115

03.2

Deconstruct: FiveThirtyEight

2018 Midterm Election Forecast

108 of 115

109 of 115

2018 Midterm Election Forecast

district state D R C "lean" incumb "flip?"

1st NY 11.2 88.8 0.001 (calc) R (calc)

Mark: point

Channels:

lean (O) ~ color

flip? (C) ~ texture (i.e. stripes)

geoXY ~ positionXY (prioritizes adjacency)

110 of 115

Questions?

Next Week…

111 of 115

Topics Next Week

  • Grammar of Graphics

  • User-interface design principles

  • Interactivity in viz
    • Selection
    • Brushing, linking, highlighting
    • Navigate, pan, zoom

112 of 115

Checklist For Next Week

  • Assignment 4.1
    • Sign up for a group
    • Choose dataset (can be the same, can be new)
    • Explore (Tableau, Plot, R, Python, etc.) > Sketch > Build in Observable
    • Each team member individually prepared to discuss in class
  • Readings
    • Schneiderman & Heer Interactive Dynamics for Visual Analysis
    • Bret Victor Ladder of Abstraction
    • Aisch In Defense of Interactive Graphics
    • <Optional> Mastering Hued Color Scales, Gregor Aisch, 2013.
  • Lab 3

113 of 115

Tips for Teamwork in Observable

  • You should have received invites to our class team: this gives you free collaboration features
  • Read Team Features in the getting started guide
    • Real-time multiplayer editing
    • Fork, share, merge and comment or suggest
    • Handy components like views for interactivity
  • Observable documentation page

114 of 115

Tips for EDA in Observable

  • Use a Data Table cell to examine your imported data, filter it, created derived columns, etc.
  • Use Chart cells to make quick-and-dirty exploratory visualizations
  • Use the Plot api to fine-tune your initial ideas

💡 you can generate Plot code from your chart cells

115 of 115

A4: Explore > Sketch > Build in Observable

AirBnB Quantified by Kelli Anderson, Via Steve Heller, Infographic Sketchbooks.