1 of 87

Data Visualization

CMSC 320

Spring 2024

Fardina F Alam

2 of 87

3 of 87

Data Visualization in the Data Science Workflow

Visualization appears in multiple stages:

  1. Data exploration → understand patterns�
  2. Data cleaning → detect errors�
  3. Model evaluation → compare results�
  4. Communication → present insights�

It is both an analytical and communication tool.

It tells a story!

4 of 87

The Bad:

Sometimes, visualizations can be misleading

5 of 87

Think about, is it same data or different data?

6 of 87

Think about, what's wrong with this graph?

7 of 87

Think about, what's wrong with this graph?

8 of 87

A reader of Business Insider, P.A. Fedewa, was kind enough to revise this graph, using all of the same numbers, with the y-axis starting at the normalized bottom left, 0.

9 of 87

Think about, what's wrong with this graph?

10 of 87

Think about, what's wrong with this graph?

11 of 87

What's wrong with this graph?

Visualizations should never prioritize style over substance.

Images Source: Tumblr

12 of 87

What's wrong with this graph?

A good visualization needs to be arranged carefully and in a logical manner

Images Source: Tumblr

Actual difference – 0.5 feet. Visual Difference – 10 feet.

TAKEWAY:

Be careful when designing visualizations, and be extra careful when interpreting graphs created by others.

13 of 87

Four Elements:

  1. Information (data): collect the “correct and required” data/information.
  2. Story (Concept): provide context, explaining why data matters and suggesting necessary actions. They bridge the gap between data and the audience.
  3. Goal (Function): Define your data visualization purpose and approach upfront, and choose suitable tools, charts, and techniques aligned with your goal for effective data analysis and decision-making.
  4. Visual Form (Metaphor) presentation of your visualization

14 of 87

The Good:

15 of 87

Choose the visualization that tells the story

16 of 87

Image Source: Oregon State University Extension Service

How US uses its land to create wealth

17 of 87

18 of 87

19 of 87

Importance of Data Visualizations

Introduced by Francis Anscombe (1973).

  • Four datasets share the same descriptive statistics:�
    • Mean
    • Variance
    • Correlation
    • Regression line�
  • But when plotted, they look completely different.�

Key Insight: Numerical summaries alone can mislead; visualization reveals the true structure of data

"Anscombe's quartet" (Francis Anscombe,1973): four datasets that share the same descriptive statistics, including mean, variance, and correlation.

  • Visual displays offload mental effort from cognition to perception.
  • Patterns, trends, and differences become immediately visible when graphed.
  • We understand complex information faster and with less effort.

20 of 87

Why do visualizations?

For Ourselves (Exploration)

  • Understand and explore our data
  • Investigate specific questions or hunches�

For Others (Communication)

  • Clearly communicate insights and findings�

Why Visualization Helps

  • Raw data (Excel, CSV, text) is hard to interpret
  • Visuals make complex data clear and intuitive
  • Helps quickly identify patterns, trends, and anomalies�

Key Idea: Visualization turns complex data into insight you can see and understand quickly.

21 of 87

22 of 87

23 of 87

RECAP

24 of 87

Principles of data visualization

25 of 87

Humans naturally perceive objects as organized patterns and objects (Prägnanz in Gestalt psychology)

26 of 87

Gestalt Laws in Data Visualization

  • Law of Proximity → Objects close together are seen as related�
  • Law of Similarity → Similar color, shape, or size implies grouping�
  • Law of Enclosure (Common Region) → Items inside a boundary are grouped�
  • Law of Continuity → The eye follows smooth, continuous paths�
  • Law of Closure → The mind fills in missing parts to see complete shapes�
  • Law of Connection → Elements that are visually connected (lines, arrows, links) are perceived as belonging together or being related.�

Key Idea: Good visualizations use Gestalt principles to guide attention, show relationships, and make patterns instantly clear.

Gestalt principles explain how people naturally perceive visual information as organized patterns, not isolated parts

27 of 87

Key parts of a data visualization

28 of 87

Components of a Visualization

Source: Abrigo, L.A. & Schneider, G.S. (2018). Data visualization guidelines. The Cato Institute. https://github.com/glosophy/CatoDataVizGuidelines

29 of 87

Storytelling

30 of 87

Why Tell Stories with Data?

We’ve been using stories to teach since there was language. We are much more likely to remember facts and internalize an idea when presented as a story.

Psychologists argue that its especially difficult for us to make sense of statistics without narrative.

31 of 87

Let the data tell the story

The data cannot tell you what is important.

  • If you don’t give the audience a story, they will have to do the work of creating one (or worse, they won’t bother to engage at all).
  • With this understanding, let the data speak for itself — but present it in a way that makes it easy for the audience to build the correct story.

When creating a narrative structure, the most basic question is, “what do you want the audience to know, and when?” ... this is also the basic question of data storytelling.

32 of 87

What’s in a Story?

  • A sequence of events, change over time
  • Context
  • Characters and setting
  • Conflict and resolution

33 of 87

Example: A Story

34 of 87

A Data Visualization Story

35 of 87

Data Visualization Story

\

Attention theory predicts that

  • you will first notice the spike in price,
  • then look for events related to it,
  • then read the title of the chart.

Preattentive features guide the sequence of the narrative (instead of the text “label”).

Slowly Reveal:

Gradually introduces information to build understanding and maintain interest, allowing viewers to absorb key insights step-by-step.

In simpler terms, your eyes are drawn to visual cues like changes in price before you read the actual words on the chart.

36 of 87

But Why Data Visualization Stories?

  • Transfer of information is quick and effortless with images compared to text.
  • We need narratives to make sense of statistics, and these narratives emerge most quickly when they are visual.

37 of 87

38 of 87

Adding Context and Narrative Elements

  1. We increased the thickness of the lines in each panel to draw attention to the key periods.
  2. We added textual labels to highlight significant values in the dataset, such as death tolls in March and December.
  3. We labeled each panel with its respective time period, using bold text to emphasize the transitions between different phases of the pandemic.

39 of 87

In this final round of tweaks, we've added several important elements to make the story stand out:

  • Headline and subtitle:
  • Horizontal bars: Each horizontal bar represents the death toll for a specific time period. The lighter background bar helps provide context, while the darker bar shows the actual number of deaths for each section of the year.
  • Death toll annotations: We've added bold, red text at the end of each section to highlight the exact number of deaths during that period.

40 of 87

Focusing Attention

41 of 87

Visual Perception

Visual processing happens in its own part of the brain, based on its own rules!

Presentators can leverage research helping us understand these rules ...

● Visual Attention Theory

● Visual Salience

● Preattentive features

42 of 87

Visual Attention Theory

In writing, attention goes in a fixed order.

With visualizations, our eyes follow the visuals!

43 of 87

Visual Attention Theory

Track where your attention goes!

44 of 87

Visual Attention Theory

The horizontal red bar was salient in the first image ... ... but visual properties alone do not determine salience; context is crucial.

Orientation and color are examples of preattentive features.

Preattentive processing is performed automatically and instantly* on the entire visual field, but only for a limited set of object features.

45 of 87

Visual Attention Theory

Hard to parse both at once!

46 of 87

Preattentive Processing

Brain’s ability to quickly and automatically process basic visual features before conscious attention is directed

47 of 87

Count the 4s

48 of 87

Count the 4s: Color

49 of 87

Preattentive Features

Specific visual attributes, like color, size, or tilt, that the brain processes quickly and automatically without focused attention.

List of the preattentive attributes that are of particular use in visual displays of data

50 of 87

Pre-attentive Processing: Color

Healey, C. G., & Enns, J. T. (2012). Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7), 1170-1188.�http://dx.doi.org.proxy.lib.duke.edu/10.1109/TVCG.2011.127

Color differences are noticed instantly�

Use color to highlight key points�

Keep a consistent, limited palette

51 of 87

Pre-attentive Processing: Shape

Healey, C. G., & Enns, J. T. (2012). Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7), 1170-1188.�http://dx.doi.org.proxy.lib.duke.edu/10.1109/TVCG.2011.127

Shapes are recognized instantly by the brain�

Use different shapes to distinguish categories or data types�

Combine with color or size for clearer visualization

52 of 87

Pre-attentive Processing: Combined

Healey, C. G., & Enns, J. T. (2012). Attention and visual memory in visualization and computer graphics. IEEE Transactions on Visualization and Computer Graphics, 18(7), 1170-1188.�http://dx.doi.org.proxy.lib.duke.edu/10.1109/TVCG.2011.127

Combine color, shape, size, and orientation to guide attention�

Helps viewers quickly identify patterns, trends, and outliers�

Use sparingly to avoid clutter and confusion

53 of 87

Using Color

54 of 87

Design Principles

  • Each visualization should be designed to communicate a specific insight or two.
    • Don't ever include a graph just because you feel like you should!
    • Know what each graph is telling your viewers
  • Maximize information and minimize ink
    • Avoid anything extraneous
  • Organize hierarchically
    • Start broad and zoom in
  • Dazzle
    • A good visualization is worth a lot

55 of 87

Visualization using Programming

Python

– matplotlib

– seaborn

– plotly

– pylab

R

– graphics

– ggplot2

Streamlit → Not a plotting library. It’s a web app framework that lets you build interactive dashboards and data apps. It can use matplotlib, seaborn, and plotly plots inside it, but by itself it doesn’t replace them.

56 of 87

Common Tools for Data Visualization

  • Tableau
  • Microsoft Power BI
  • Infogram
  • ChartBlocks
  • D3.js
  • Google Charts
  • Fusion Charts
  • Chart.js

57 of 87

Creating Visualizations in Python

Two widely used libraries:

  • Matplotlib → foundational plotting library
  • Seaborn → statistical visualization built on Matplotlib

import matplotlib.pyplot as plt

x = [1,2,3,4,5]

y = [2,4,6,8,10]

plt.plot(x, y)

plt.xlabel("X values")

plt.ylabel("Y values")

plt.title("Simple Line Chart")

plt.show()

Different Styles of Same Plot (can use Matplotlib)

58 of 87

Creating Visualizations in Python

import matplotlib.pyplot as plt

# simple data

x = [1, 2, 3, 4]

y = [2, 4, 6, 8]

# create 2x2 subplot layout

plt.figure(figsize=(8,6))

# ------------------

# 1. Line chart

# ------------------

plt.subplot(2,2,1)

plt.plot(x, y)

plt.title("Line")

# ------------------

# 2. Bar chart

# ------------------

plt.subplot(2,2,2)

plt.bar(x, y)

plt.title("Bar")

# ------------------

# 3. Scatter plot

# ------------------

plt.subplot(2,2,3)

plt.scatter(x, y)

plt.title("Scatter")

# ------------------

# 4. Histogram

# ------------------

plt.subplot(2,2,4)

plt.hist(y)

plt.title("Histogram")

plt.tight_layout()

plt.show()

59 of 87

What Type of Visualization to Use?

SELF STUDY (INCLUDED IN THE EXAM)

60 of 87

(Data Visualization Cheat Sheet)

61 of 87

Why are we spending the lectures on data visualization?

  • When visualizations are good, they're very good
  • When they're bad, they are VERY bad

62 of 87

Exploration

  • What sort of things might we want to know about our data?

63 of 87

Types of Visualization (And when to use them)

64 of 87

Main Types

  • Data Over Time
  • Comparison
  • Correlation
  • Part-To-Whole
  • Distribution

65 of 87

66 of 87

Line charts - display trends

A line graph reveals trends or progress over time.

  • Data points called 'markers' connected by straight line segments
  • Show trends across categories for easy comparison.
  • should use it when we chart a continuous data set.

67 of 87

Example: Line charts

Observe: biggest customers are 34-45 year old buyers of PDAs, followed by 19-24 year old buyers of cell phones.

Ex: Sales figures by age group for three different product lines

68 of 87

Best practices for line charts

Line color:

Limited number of colors and line styles. If many lines, use grey for most.

Numerical axis:

X-axis variable is a number �(e.g., years) and not a set of categories (e.g., countries).

Complete axis:

X-axis displays the numbers correctly in a full range, instead of having gaps

Legend:

Lines labeled directly when possible

69 of 87

70 of 87

Bar charts - break things down, simply

Bar graphs can help comparing data between different groups or track changes over time.

  • Presents categorical (nominal or ordinal) data with rectangular bars with heights proportional to the values that they represent.
  • most useful when there are big changes or when comparing one group against another.

71 of 87

Bar charts - break things down, simply

Bar graphs can help comparing data between different groups or track changes over time.

  • most useful when there are big changes or when comparing one group against another.
  • Great for comparing several different values, especially when some of these are broken into color-coded categories.

72 of 87

Example: Bar charts

Revisualize previous chart as a bar chart

  • Products are group by age
  • Explore detailed sales differences within each age category.
  • Quickly identify the most valuable age groups for business

73 of 87

74 of 87

Column charts - compare values side-by-side

Use for side-by-side comparisons of different values.

  • Show changes over time
  • Ideal for scenarios where daily changes are minimal.

Total website page views vs. sessions on various dates.

75 of 87

Example: Column charts

Use for side-by-side comparisons of different values.

  • Highlights concrete numbers, such as the daily number of website visitors.

Plot: Total website page views vs. sessions on various dates.

76 of 87

77 of 87

Scatter charts - relationships

  • Displays the relationship between two continuous variables.
  • It involves plotting individual data points as dots on a graph, with one variable on the x-axis and the other variable on the y-axis.
    • Each dot represents an observation in the dataset, and the pattern formed by the dots can reveal the nature and strength of the relationship between the variables.

78 of 87

Scatter charts - relationships

  • The chart visualizes each product line by the number of units sold (x-axis) and the revenue this brings in (y-axis), representing the value in physical size.
  • It also breaks this down by gender (hovering over the circles would reveal the name of the product in the original).

79 of 87

Scatter plot

  • Bivariate plot of two continuous variables
  • Can be many y values for each x
  • Trend-lines can show a pattern but do not connect all the points
  • Variations: bubble chart

80 of 87

Best practices for scatter plots

Overplotting:

Transparency shows where points overlap

Bubble size:

Data maps to bubble area, not radius or diameter

Point color:

Limited number of colors

Selective labeling:

Points labeled directly, but only points of interest

Explanatory elements:

Additional elements (reference lines, annotations) help explain trends

81 of 87

82 of 87

Histogram

A histogram is a graph that shows the frequency of numerical data using rectangles.

83 of 87

84 of 87

Pie charts clearly show proportions

Pie charts easily show the share each value contributes to the whole.

  • A circle divided into slices to illustrate numerical proportion
    • Categorical data
  • More intuitive than simply listing percentages that add up to 100%.

Example: This pie chart illustrates the effectiveness of different marketing campaigns in generating leads

85 of 87

Pie charts clearly show proportions

Example: Clearly shows that AdWords is the most effective, followed by social media and webinar signups.

  • An instant insight would illuminate to the marketing team what’s working best, helping them to rapidly reassign resources or refocus their efforts to maximize lead generation.

Example: This pie chart illustrates the effectiveness of different marketing campaigns in generating leads

86 of 87

87 of 87

Additional Reading Slides