1 of 103

Data Vis�Intro

2 of 103

Intro to Data Vis

by Shay Palachy Affek

by Shay Palachy

3 of 103

Agenda

4 of 103

Agenda

  • Motivation
  • Shay introduces himself
  • A (partial) intro to Data Vis
  • (some) chart types
  • Learning (a bit) of Tableau together

5 of 103

Motivation

6 of 103

Why bother�with this�lecture?

7 of 103

Data Age

New tools for old professions

New professions

8 of 103

Data is eating the world

9 of 103

Everybody’s�working with�data

(bankers, accountants, lawyers, mechanical eng., managers everywhere, etc.)

10 of 103

Communicating

data insights�is hard

11 of 103

Either you need this new tool in an old profession

12 of 103

Or you can use it to define a new one

13 of 103

About me

14 of 103

Education

Work

Non-profit

Passion projects

15 of 103

B.Sc. & M.Sc. CS @ HebrewU

MBA @ TAU

16 of 103

Lead the DS @ a couple of startups

Consult DS teams @ startups

VP DS @ another startup

Back to Consulting

Lecture & teach

17 of 103

DataHack, started as large hackathons

Runs the DataNights course series

DataTalks meetup series

DataCoach @ Technion

(working on more programs…)

18 of 103

Pet projects:

Small open source Python projects

DS Team Mgmt @ DataNights

Talks & blog posts on DS Mgmt

19 of 103

My relation to visualization

  • Producer, as a data scientist
  • Consumer, as a manager
  • Worked on several products centered around vis
    • Social network data
    • IT Ops data
    • Insurance data
  • Teacher, at here at TAU

20 of 103

Note: If you want to follow the slides…

21 of 103

https://www.shaypalachy.com/talks.html

22 of 103

Motivational GIF

23 of 103

24 of 103

Intro to Data Vis

25 of 103

“The world's most valuable resource is no longer oil, but data.”

The Economist, 2017

why?

26 of 103

27 of 103

By providing knowledge and delivering insights data visualization enables planning and strategizing.

28 of 103

Which of these forms better utilizes human visual processing for the purpose of providing information and insights about the data?

29 of 103

What are some properties we visually percept and note?

12345

12345

12345

12 45

3

12345

30 of 103

Color

12345

12345

12345

12 45

3

12345

Size

Orientation

Texture

31 of 103

Preattentive Processing

Effective data vis uses the brain’s preattentive visual processing.

Because our eyes detect a limited set of visual characteristics (e.g. shape, contrast), we combine various features of an object and unconsciously perceive them as comprising an image.

Preattentive processing refers to the cognitive operations that can be performed prior to focusing attention on any particular region of an image. Meaning, it’s what you notice right away.

(and eyes’)

(contrast, orientation, edges, boundaries & surfaces, object recognition, foreground, …)

32 of 103

Preattentive Processing - An Empirical Demonstration

  • Researchers showed people groups of dots.
  • Groups were of either 18 or 24 dots.
  • Dots were either connected or not.
  • Known Illusion: When connected into dumbbell shapes, the overall count seems smaller.
  • Overall number of white pixels was identical across images.
  • People were asked to watch passively, not count, etc.

33 of 103

Preattentive Processing - An Empirical Demonstration

  • The participants' pupils dilated, or expanded, when they perceived a greater number of dots.
  • Pupils constricted when they perceived fewer dots.

“The finding suggests that the pupil is equipped with some mechanism that can sense quantity… This result shows that numerical information is intrinsically related to perception."�(from a statement given by the researchers)

34 of 103

But why bother learning it?

�Is there a wrong way to vis?

35 of 103

Let’s take a look at some terrible examples! <3

36 of 103

37 of 103

38 of 103

39 of 103

45

40 of 103

45

41 of 103

Let’s recall the basic motivation:

The Age of Information Overload

42 of 103

43 of 103

The Age of Information Overload

  • Large amounts of data cannot be easily summarized and displayed.
  • For instance, which summary statistics to use?
  • Let’s look at an example…

44 of 103

45 of 103

46 of 103

47 of 103

Plotting all four data sets on a 2D plane immediately exposed the vast differences in the underlying dynamics!

48 of 103

Plotting all four data sets on a 2D plane immediately exposed the vast differences in the underlying dynamics!

49 of 103

But it’s not just about discovery of data properties!

It’s also about convincing and driving action. About making these insights understood intuitively.

50 of 103

But it’s not just about discovery of data properties!

It’s also about convincing and driving action. About making these insights understood intuitively.

And it’s also about enabling, improving and augmenting human task performance: Analysis, research, discovery, investigation, inference, etc.

51 of 103

52 of 103

53 of 103

Doctors are asked to supervise a clinical trial.�

Participants were shown four types of data vis-es containing hypothetical data from the trial…

And were asked to decide whether to continue the trial or stop for an unplanned statistical analysis.��There is a single objectively correct answer.

54 of 103

55 of 103

56 of 103

Munzner’s definition for data vis

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

Who?

Will drive us to questions about the target audience: Scientists? Managers? The public?

57 of 103

Motivational GIF

58 of 103

59 of 103

Chart Types

60 of 103

The Histogram

61 of 103

The Histogram

Data

1 quantitative attribute (no keys)

Marks

Line marks

Channels

  • Horizontal position — (bucketed) value of quant.
  • Line length — Frequency of values in bucket (derived quant.)
  • Color — Optional re-encoding of quant. value

Task

Distribution of a quantitative value

Scalability

Dozens of buckets for quant. value

62 of 103

The Scatter Plot

  • Design Choice: Express values (magnitudes)
    • Relevant to quantitative attributes only
  • Data: 2 quantitative�attributes (no keys!)
  • Marks: Points
  • Channels:�1. Horizontal position.�2. Vertical position.

63 of 103

The Scatter Plot

  • Tasks: Find trends, Outliers,� Distribution, Correlation, Show clusters
  • Scalability: Hundreds of items
  • Additional channels viable since uses point marks
    • Color
    • Size (a quant. attribute used to control 2D area)
    • Shape

(Note: Since area grows quadratically radius misleads; take square root;)

64 of 103

The Scatter Plot

Encoding a 3rd quant. channel w/ area

Encoding a 3rd categorical channel w/ color & shape

65 of 103

Scatter Plot Tasks: Correlation

66 of 103

Scatter Plot Tasks: Clusters & Clusters vs Classes

67 of 103

The Bar Chart

Different types of bar ordering…

68 of 103

The Bar Chart

Data

1 categorical, 1 quantitative

Marks

Lines

Channels

Length expresses the quantitative attribute

Spatial regions: One per mark

  • Separated horizontally, aligned vertically
  • Ordered by quant. attribute:�By label (alphabetical), by length attribute (data-driven)

Task

Compare, Lookup values

Scalability

Dozens to hundreds of levels for key attrib [bars],�hundreds for values

69 of 103

The Diverging Bar Chart

Encodes data using height/length of bar diverging from a midpoint to show categorical comparisons.

70 of 103

The Tornado Bar Chart

  • Data categories are listed vertically
  • Categories are ordered so that the largest bar appears at the top of the chart, etc.
  • Usually right vs left bars encode a 2nd categorical

71 of 103

The Stacked Bar Chart

72 of 103

The Stacked Bar Chart

Data

2 categoricals, 1 quantitative

Marks

Vertical stack of line marks

Channels

  • Length and color hue
  • Spatial regions: one per glyph
    • Aligned: Full glyph, lowest bar component
    • Unaligned: Other bar components

Task

Compare, Lookup values + part-to-whole relationship

Scalability

For stacked key attribute, 10-12 levels [segments]

For main key attrib, dozens to hundreds of levels [bars]

73 of 103

The Normalized Stacked Bar Chart

74 of 103

The Normalized Stacked Bar Chart

Like a stacked bar chart, but

  • Normalized to full vert height
  • A single stacked bar is equivalent to a full pie chart
    • High information density: Requires narrow rectangles

�More suitable for part-to-whole judgements with no need to compare magnitude; better comparison of ratios.

75 of 103

ordered key attrib (time)

quant value attrib. (gross)

categ key attrib (movies)

The Streamgraph

76 of 103

The Streamgraph

Data

1 categorical, 1 ordered, 1 quantitative

Marks

Composite regiones

Channels

  • Ordered (Time) Horizontal location
  • Category - Color
  • Quantity - Size

Task

Compare, part-to-whole relationship over time

Scalability

Hundreds of time keys

Dozens to hundreds of category keys

(more than stacked bars: most layers don’t extend across)

77 of 103

The Streamgraph

a smoothing effect

78 of 103

The Dot/Line Chart

79 of 103

The Dot/Line Chart

Data

2 quantitative attributes: 1 as key, 1 as value

Marks

Points and line connection marks between them

Channels

  • Aligned lengths to express quant. value
  • Separated and ordered by key attribute into horizontal regions

Task

Find trends

Scalability

Hundreds of key levels

Hundreds of value levels

80 of 103

Choosing bar vs line charts

  • Depends on the type of the key attribute:
    • Bar charts if categorical
    • Line charts if ordered

81 of 103

Choosing bar vs line charts

  • Depends on the type of the key attribute:
    • Bar charts if categorical
    • Line charts if ordered

82 of 103

Choosing bar vs line charts

Using line charts for categorical keys violates the expressiveness principle.

The implication of trend is so strong that it overrides semantics!

“The more male a person is, the taller he/she is”

83 of 103

The Heatmap

84 of 103

The Heatmap

Data

2 categoricals (2 key!), 1 quant. attribute (value)

Marks

Fixed square region

Channels

Color — Quantitative attribute value

Horz./Vert. Location — By the chosen ordering of the categoricals

Task

find clusters, outliers, relations between values

Scalability

1M items, 100s of categ levels, �~10 (bucketed) quant. attribute levels

85 of 103

The Highlight Table

86 of 103

The Highlight Table

Data

2 categoricals (2 key!), 1 quant. attribute (value)

Marks

Fixed square region

Channels

Color — Quantitative attribute value

Horz./Vert. Location — By the chosen ordering of the categoricals

Task

find clusters, outliers, relations between values

Scalability

100s of categ levels, ~10 (bucketed) quant. attribute levels

87 of 103

The Box Plot

88 of 103

The Box Plot

Data

1 categoricals (1 key), 1 quant. attribute (value)

Marks

Closed region

Channels

Horizontal location — Categorical values

Vertical location — Median of quant. for this category

Box boundaries —1st & 3rd quantiles of quant. for this category

Line length (whiskers) —Min & max of quant. for this category

Color — Usually re-encodes categorical; can encode another categ.

Task

Compare distributions

Scalability

Low

89 of 103

The Pie Chart

90 of 103

The Pie Chart

Data

1 categorical, 1 quantitative

Marks

Interlocking area marks

Channels

  • Color (usually) - Encodes category
  • Angle channel - Re-encodes category + dictates volume, which encodes the quantity
  • Height (length from center) - Encodes nothing; uniform
  • Order - Encodes category order

Task

Part-to-whole judgements

Scalability

Very poor: Not more than a few categories

91 of 103

The Pie Chart: Best Practices

  • No more than 6 segments
  • Order segments by size
  • No shading, 3d, etc.
  • Make sure segment values sum up to 100%
  • Label segments with the quantitative value

92 of 103

The Coxcomb Chart

93 of 103

The Coxcomb Chart

Data

1 categorical, 1 quantitative

Marks

Interlocking area marks

Channels

  • Color (usually) - Encodes category
  • Angle channel - Fixed
  • Height (length from center) - Encodes quant.
  • Order - Encodes category order

Task

Part-to-whole judgements

Scalability

Very poor: Not more than a few categories

94 of 103

The Coxcomb Chart

95 of 103

Motivational GIF

96 of 103

97 of 103

Tableau Workout

98 of 103

Tableau Workout - Creating Charts

  • Download the dataset file from here:

US Store Dataset OR https://shorturl.at/jwPUW

  • And let’s follow the slides here:

Slides�OR�https://shorturl.at/ciNO1 (not zero)

99 of 103

Motivational GIF

100 of 103

101 of 103

Another Tableau Workout

102 of 103

Tableau Workout - Creating Dashboards

  • Download the base workbook file from here:

Base_workbook.twbx OR https://shorturl.at/aciY2

  • And let’s follow the slides here:

Tutorial: Tableau Dashboards OR https://shorturl.at/CEFP4

103 of 103

Thank you for listening