Data Visualization
CMSC 320
Spring 2024
Fardina F Alam
Data Visualization in the Data Science Workflow
Visualization appears in multiple stages:
It is both an analytical and communication tool.
It tells a story!
The Bad:
Sometimes, visualizations can be misleading
Think about, is it same data or different data?
Think about, what's wrong with this graph?
Think about, what's wrong with this graph?
A reader of Business Insider, P.A. Fedewa, was kind enough to revise this graph, using all of the same numbers, with the y-axis starting at the normalized bottom left, 0.
Think about, what's wrong with this graph?
Think about, what's wrong with this graph?
What's wrong with this graph?
Visualizations should never prioritize style over substance.
Images Source: Tumblr
What's wrong with this graph?
A good visualization needs to be arranged carefully and in a logical manner
Images Source: Tumblr
Actual difference – 0.5 feet. Visual Difference – 10 feet.
TAKEWAY:
Be careful when designing visualizations, and be extra careful when interpreting graphs created by others.
Four Elements:
The Good:
Choose the visualization that tells the story
Image Source: Oregon State University Extension Service
How US uses its land to create wealth
Importance of Data Visualizations
Introduced by Francis Anscombe (1973).
Key Insight: Numerical summaries alone can mislead; visualization reveals the true structure of data
"Anscombe's quartet" (Francis Anscombe,1973): four datasets that share the same descriptive statistics, including mean, variance, and correlation.
Why do visualizations?
For Ourselves (Exploration)
For Others (Communication)
Why Visualization Helps
Key Idea: Visualization turns complex data into insight you can see and understand quickly.
RECAP
Principles of data visualization
Humans naturally perceive objects as organized patterns and objects (Prägnanz in Gestalt psychology)
Gestalt Laws in Data Visualization
Key Idea: Good visualizations use Gestalt principles to guide attention, show relationships, and make patterns instantly clear.
Gestalt principles explain how people naturally perceive visual information as organized patterns, not isolated parts
Key parts of a data visualization
Components of a Visualization
Source: Abrigo, L.A. & Schneider, G.S. (2018). Data visualization guidelines. The Cato Institute. https://github.com/glosophy/CatoDataVizGuidelines
Storytelling
Why Tell Stories with Data?
We’ve been using stories to teach since there was language. We are much more likely to remember facts and internalize an idea when presented as a story.
Psychologists argue that its especially difficult for us to make sense of statistics without narrative.
Let the data tell the story
The data cannot tell you what is important.
When creating a narrative structure, the most basic question is, “what do you want the audience to know, and when?” ... this is also the basic question of data storytelling.
What’s in a Story?
Example: A Story
A Data Visualization Story
Data Visualization Story
\
Attention theory predicts that
Preattentive features guide the sequence of the narrative (instead of the text “label”).
Slowly Reveal:
Gradually introduces information to build understanding and maintain interest, allowing viewers to absorb key insights step-by-step.
In simpler terms, your eyes are drawn to visual cues like changes in price before you read the actual words on the chart.
But Why Data Visualization Stories?
Adding Context and Narrative Elements
In this final round of tweaks, we've added several important elements to make the story stand out:
Focusing Attention
Visual Perception
Visual processing happens in its own part of the brain, based on its own rules!
Presentators can leverage research helping us understand these rules ...
● Visual Attention Theory
● Visual Salience
● Preattentive features
Visual Attention Theory
In writing, attention goes in a fixed order.
With visualizations, our eyes follow the visuals!
Visual Attention Theory
Track where your attention goes!
Visual Attention Theory
The horizontal red bar was salient in the first image ... ... but visual properties alone do not determine salience; context is crucial.
Orientation and color are examples of preattentive features.
Preattentive processing is performed automatically and instantly* on the entire visual field, but only for a limited set of object features.
Visual Attention Theory
Hard to parse both at once!
Preattentive Processing
Brain’s ability to quickly and automatically process basic visual features before conscious attention is directed
Count the 4s
Count the 4s: Color
Preattentive Features
Specific visual attributes, like color, size, or tilt, that the brain processes quickly and automatically without focused attention.
List of the preattentive attributes that are of particular use in visual displays of data
Stephen Few, “Tapping the Power of Visual Perception”
Pre-attentive Processing: Color
Healey, C. G., & Enns, J. T. (2012). Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7), 1170-1188.�http://dx.doi.org.proxy.lib.duke.edu/10.1109/TVCG.2011.127
Color differences are noticed instantly�
Use color to highlight key points�
Keep a consistent, limited palette
Pre-attentive Processing: Shape
Healey, C. G., & Enns, J. T. (2012). Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7), 1170-1188.�http://dx.doi.org.proxy.lib.duke.edu/10.1109/TVCG.2011.127
Shapes are recognized instantly by the brain�
Use different shapes to distinguish categories or data types�
Combine with color or size for clearer visualization
Pre-attentive Processing: Combined
Healey, C. G., & Enns, J. T. (2012). Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7), 1170-1188.�http://dx.doi.org.proxy.lib.duke.edu/10.1109/TVCG.2011.127
Combine color, shape, size, and orientation to guide attention�
Helps viewers quickly identify patterns, trends, and outliers�
Use sparingly to avoid clutter and confusion
Using Color
Design Principles
Visualization using Programming
Python
– matplotlib
– seaborn
– plotly
– pylab
R
– graphics
– ggplot2
Streamlit → Not a plotting library. It’s a web app framework that lets you build interactive dashboards and data apps. It can use matplotlib, seaborn, and plotly plots inside it, but by itself it doesn’t replace them.
Common Tools for Data Visualization
Creating Visualizations in Python
Two widely used libraries:
import matplotlib.pyplot as plt
x = [1,2,3,4,5]
y = [2,4,6,8,10]
plt.plot(x, y)
plt.xlabel("X values")
plt.ylabel("Y values")
plt.title("Simple Line Chart")
plt.show()
Different Styles of Same Plot (can use Matplotlib)
Creating Visualizations in Python
import matplotlib.pyplot as plt
# simple data
x = [1, 2, 3, 4]
y = [2, 4, 6, 8]
# create 2x2 subplot layout
plt.figure(figsize=(8,6))
# ------------------
# 1. Line chart
# ------------------
plt.subplot(2,2,1)
plt.plot(x, y)
plt.title("Line")
# ------------------
# 2. Bar chart
# ------------------
plt.subplot(2,2,2)
plt.bar(x, y)
plt.title("Bar")
# ------------------
# 3. Scatter plot
# ------------------
plt.subplot(2,2,3)
plt.scatter(x, y)
plt.title("Scatter")
# ------------------
# 4. Histogram
# ------------------
plt.subplot(2,2,4)
plt.hist(y)
plt.title("Histogram")
plt.tight_layout()
plt.show()
What Type of Visualization to Use?
SELF STUDY (INCLUDED IN THE EXAM)
(Data Visualization Cheat Sheet)
Why are we spending the lectures on data visualization?
Exploration
Types of Visualization (And when to use them)
Main Types
Line charts - display trends
A line graph reveals trends or progress over time.
Example: Line charts
Observe: biggest customers are 34-45 year old buyers of PDAs, followed by 19-24 year old buyers of cell phones.
Ex: Sales figures by age group for three different product lines
Best practices for line charts
Line color:
Limited number of colors and line styles. If many lines, use grey for most.
Numerical axis:
X-axis variable is a number �(e.g., years) and not a set of categories (e.g., countries).
Complete axis:
X-axis displays the numbers correctly in a full range, instead of having gaps
Legend:
Lines labeled directly when possible
Bar charts - break things down, simply
Bar graphs can help comparing data between different groups or track changes over time.
Bar charts - break things down, simply
Bar graphs can help comparing data between different groups or track changes over time.
Example: Bar charts
Revisualize previous chart as a bar chart
Column charts - compare values side-by-side
Use for side-by-side comparisons of different values.
Total website page views vs. sessions on various dates.
Example: Column charts
Use for side-by-side comparisons of different values.
Plot: Total website page views vs. sessions on various dates.
Scatter charts - relationships
Scatter charts - relationships
Scatter plot
Best practices for scatter plots
Overplotting:
Transparency shows where points overlap
Bubble size:
Data maps to bubble area, not radius or diameter
Point color:
Limited number of colors
Selective labeling:
Points labeled directly, but only points of interest
Explanatory elements:
Additional elements (reference lines, annotations) help explain trends
Histogram
A histogram is a graph that shows the frequency of numerical data using rectangles.
Pie charts clearly show proportions
Pie charts easily show the share each value contributes to the whole.
Example: This pie chart illustrates the effectiveness of different marketing campaigns in generating leads
Pie charts clearly show proportions
Example: Clearly shows that AdWords is the most effective, followed by social media and webinar signups.
Example: This pie chart illustrates the effectiveness of different marketing campaigns in generating leads
Additional Reading Slides