Data Vis�Intro
Intro to Data Vis
by Shay Palachy Affek
by Shay Palachy
Agenda
Agenda
Motivation
Why bother�with this�lecture?
Data Age
New tools for old professions
New professions
Data is eating the world
Everybody’s�working with�data
(bankers, accountants, lawyers, mechanical eng., managers everywhere, etc.)
Communicating
data insights�is hard
Either you need this new tool in an old profession
Or you can use it to define a new one
About me
Education
Work
Non-profit
Passion projects
B.Sc. & M.Sc. CS @ HebrewU
MBA @ TAU
Lead the DS @ a couple of startups
Consult DS teams @ startups
VP DS @ another startup
Back to Consulting
Lecture & teach
DataHack, started as large hackathons
Runs the DataNights course series
DataTalks meetup series
DataCoach @ Technion
(working on more programs…)
Pet projects:
Small open source Python projects
DS Team Mgmt @ DataNights
Talks & blog posts on DS Mgmt
My relation to visualization
Note: If you want to follow the slides…
https://www.shaypalachy.com/talks.html
Motivational GIF
Intro to Data Vis
“The world's most valuable resource is no longer oil, but data.”
The Economist, 2017
why?
By providing knowledge and delivering insights data visualization enables planning and strategizing.
Which of these forms better utilizes human visual processing for the purpose of providing information and insights about the data?
What are some properties we visually percept and note?
12345
12345
12345
12 45
3
12345
Color
12345
12345
12345
12 45
3
12345
Size
Orientation
Texture
Preattentive Processing
Effective data vis uses the brain’s preattentive visual processing.
Because our eyes detect a limited set of visual characteristics (e.g. shape, contrast), we combine various features of an object and unconsciously perceive them as comprising an image.
Preattentive processing refers to the cognitive operations that can be performed prior to focusing attention on any particular region of an image. Meaning, it’s what you notice right away.
(and eyes’)
(contrast, orientation, edges, boundaries & surfaces, object recognition, foreground, …)
Preattentive Processing - An Empirical Demonstration
Preattentive Processing - An Empirical Demonstration
�“The finding suggests that the pupil is equipped with some mechanism that can sense quantity… This result shows that numerical information is intrinsically related to perception."�(from a statement given by the researchers)
But why bother learning it?
�Is there a wrong way to vis?
Let’s take a look at some terrible examples! <3
45
45
Let’s recall the basic motivation:
The Age of Information Overload
The Age of Information Overload
Plotting all four data sets on a 2D plane immediately exposed the vast differences in the underlying dynamics!
Plotting all four data sets on a 2D plane immediately exposed the vast differences in the underlying dynamics!
But it’s not just about discovery of data properties!
It’s also about convincing and driving action. About making these insights understood intuitively.
But it’s not just about discovery of data properties!
It’s also about convincing and driving action. About making these insights understood intuitively.
And it’s also about enabling, improving and augmenting human task performance: Analysis, research, discovery, investigation, inference, etc.
Doctors are asked to supervise a clinical trial.�
Participants were shown four types of data vis-es containing hypothetical data from the trial…
And were asked to decide whether to continue the trial or stop for an unplanned statistical analysis.��There is a single objectively correct answer.
Munzner’s definition for data vis
Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.
Who?
Will drive us to questions about the target audience: Scientists? Managers? The public?
Motivational GIF
Chart Types
The Histogram
The Histogram
Data | 1 quantitative attribute (no keys) |
Marks | Line marks |
Channels |
|
Task | Distribution of a quantitative value |
Scalability | Dozens of buckets for quant. value |
The Scatter Plot
The Scatter Plot
(Note: Since area grows quadratically radius misleads; take square root;)
The Scatter Plot
Encoding a 3rd quant. channel w/ area
Encoding a 3rd categorical channel w/ color & shape
Scatter Plot Tasks: Correlation
Scatter Plot Tasks: Clusters & Clusters vs Classes
The Bar Chart
Different types of bar ordering…
The Bar Chart
Data | 1 categorical, 1 quantitative |
Marks | Lines |
Channels | Length expresses the quantitative attribute Spatial regions: One per mark
|
Task | Compare, Lookup values |
Scalability | Dozens to hundreds of levels for key attrib [bars],�hundreds for values |
The Diverging Bar Chart
Encodes data using height/length of bar diverging from a midpoint to show categorical comparisons.
The Tornado Bar Chart
The Stacked Bar Chart
The Stacked Bar Chart
Data | 2 categoricals, 1 quantitative |
Marks | Vertical stack of line marks |
Channels |
|
Task | Compare, Lookup values + part-to-whole relationship |
Scalability | For stacked key attribute, 10-12 levels [segments] For main key attrib, dozens to hundreds of levels [bars] |
The Normalized Stacked Bar Chart
The Normalized Stacked Bar Chart
Like a stacked bar chart, but
�More suitable for part-to-whole judgements with no need to compare magnitude; better comparison of ratios.
ordered key attrib (time)
quant value attrib. (gross)
categ key attrib (movies)
The Streamgraph
The Streamgraph
Data | 1 categorical, 1 ordered, 1 quantitative |
Marks | Composite regiones |
Channels |
|
Task | Compare, part-to-whole relationship over time |
Scalability | Hundreds of time keys Dozens to hundreds of category keys (more than stacked bars: most layers don’t extend across) |
The Streamgraph
a smoothing effect
The Dot/Line Chart
The Dot/Line Chart
Data | 2 quantitative attributes: 1 as key, 1 as value |
Marks | Points and line connection marks between them |
Channels |
|
Task | Find trends |
Scalability | Hundreds of key levels Hundreds of value levels |
Choosing bar vs line charts
Choosing bar vs line charts
Choosing bar vs line charts
Using line charts for categorical keys violates the expressiveness principle.
The implication of trend is so strong that it overrides semantics!
“The more male a person is, the taller he/she is”
The Heatmap
The Heatmap
Data | 2 categoricals (2 key!), 1 quant. attribute (value) |
Marks | Fixed square region |
Channels | Color — Quantitative attribute value Horz./Vert. Location — By the chosen ordering of the categoricals |
Task | find clusters, outliers, relations between values |
Scalability | 1M items, 100s of categ levels, �~10 (bucketed) quant. attribute levels |
The Highlight Table
The Highlight Table
Data | 2 categoricals (2 key!), 1 quant. attribute (value) |
Marks | Fixed square region |
Channels | Color — Quantitative attribute value Horz./Vert. Location — By the chosen ordering of the categoricals |
Task | find clusters, outliers, relations between values |
Scalability | 100s of categ levels, ~10 (bucketed) quant. attribute levels |
The Box Plot
The Box Plot
Data | 1 categoricals (1 key), 1 quant. attribute (value) |
Marks | Closed region |
Channels | Horizontal location — Categorical values Vertical location — Median of quant. for this category Box boundaries —1st & 3rd quantiles of quant. for this category Line length (whiskers) —Min & max of quant. for this category Color — Usually re-encodes categorical; can encode another categ. |
Task | Compare distributions |
Scalability | Low |
The Pie Chart
The Pie Chart
Data | 1 categorical, 1 quantitative |
Marks | Interlocking area marks |
Channels |
|
Task | Part-to-whole judgements |
Scalability | Very poor: Not more than a few categories |
The Pie Chart: Best Practices
The Coxcomb Chart
The Coxcomb Chart
Data | 1 categorical, 1 quantitative |
Marks | Interlocking area marks |
Channels |
|
Task | Part-to-whole judgements |
Scalability | Very poor: Not more than a few categories |
The Coxcomb Chart
Motivational GIF
Tableau Workout
Tableau Workout - Creating Charts
US Store Dataset OR https://shorturl.at/jwPUW�
Slides�OR�https://shorturl.at/ciNO1 (not zero)
Motivational GIF
Another Tableau Workout
Tableau Workout - Creating Dashboards
Base_workbook.twbx OR https://shorturl.at/aciY2�
Tutorial: Tableau Dashboards OR https://shorturl.at/CEFP4
Thank you for listening