An Introduction to Data Visualisation
About us…
About you…
What data do you work with?
What tools do you use?
Who is your audience?
What would you like out of today?
What’s the most popular ONS output?
http://www.flickr.com/photos/8314403@N03/2620922750/ Happiness
Kevin?
Oliver?
Jack?
Harry?
Alfie?
Charlie?
Thomas?
William?
Joshua?
George?
James?
http://visual.ons.gov.uk/baby-names/
Introduction
Aims
Why visualise data?
Why visualise data?
Historical Perspective
Statistics in the 19th Century
William Playfair (1759-1823)
William Playfair (1759-1823)
Florence Nightingale (1820-1910)
Blue: preventible or mitigable diseases
Red: deaths from wounds
Black: deaths from other causes
Frank Anscombe (1918-2001)
And then something happened...
A call from history
Focus on best practice...
Stop being technology-led…
...be technology-enabled
Further study
Societal Perspective
How many people do you think access the ONS website from home or on a mobile phone?
?%
How many people do you think access the ONS website from home or on a mobile phone?
48%
Numeracy and Statistical Literacy
In 2003, 46.9% of working age adults in England lacked Level 1 numeracy skills.
In 2011, 49.1%of working age adults in England lacked Level 1 numeracy skills.
Level 1 numeracy skills: calculating simple percentages and converting units of measure.
For example adults without Level 1 skills may not be able to understand their payslip .
Numeracy is a societal barrier
“I phoned Camelot and they fobbed me off with some story that -6 is higher, not lower, than -8, but I’m not having it”�Tina Farrel, 23, Manchester
Wrong Said Fred!
Bulletins take
�9.5x
�longer to read than people actually spend on the page
�
53%
�of people who land on a bulletin page leave the site immediately
�
9�Is the average UK reading age
15
Is the average reading age for our content
�
Users are much more likely to search �for terms we don’t use in our bulletin…�
�
Further study
Symbology
Statistical
Relationships
Charts!
Perspectives on or stories in the data
A limited number of coding strategies…
“There are 1,900 people in Warnford…”
Context-dependent
Caution: Interpreting visual variables
Example: Length v Area
The most effective graphs make the most efficient use of visual encoding
x5.2
x2
Questions to ask yourself before producing a chart
Is your chart exploratory or explanatory?
�Exploratory charts are what you might produce to get familiar with the data. You may not be starting out with a hypothesis, question or point to make, and may just be aiming to display all of the data in a graphic for a general picture. From this, you or your user could spot interesting points to delve into further.
Explanatory charts are what we produce when you have something specific you want to show an audience - in this case, you have a hypothesis, question or point to make. We should have done the time-consuming work for the user already and have picked out the stories of relevance and interest.
In our charting and writing for statistical bulletins, the majority of the focus should be on explanatory analysis.
�
What is the specific purpose of your chart?
By producing this chart, what further understanding or insight are you hoping to communicate to your user? Is it crucial to your story? Are you adding context? Understanding this will help you choose the right chart and could also help you tell your story more efficiently.
�
What data needs to be on the chart to achieve your aims?
�When talking about a specific variable or aspect of the data, we don’t always need to show every variable or observation, just as long as we aren’t removing context. An example of this may be a line chart with more than five lines, where the story is focusing on a single series. You might consider using colour for the line of importance and make the other lines grey to provide context?
�
Text, tables, graphs and maps represent a toolkit for statistical communication.
Being able to spot when and where to use each is an important component in producing quality content, tailored to your audience.
Deviation
Correlation
Ranking
Distribution
Change over time
Magnitude
Part-to-whole
Spatial
Flow
Choosing a graph type - ‘safety first’
So what basic statistical relationships in the data should we look for?
Let’s try and match some charts to the statistical relationships they describe
Deviation
Correlation
Ranking
Distribution
Change over time
Magnitude
Part-to-whole
Spatial
Flow
Magnitude?
Magnitude
does the y axis have to start at 0?
Magnitude
does the x axis have to start at 0?
Magnitude
bar alternative
A perfect graph?
anything wrong?
White space is important!
the story can be in the ‘no data’ space
the remaining distance
to equality
White space, title and annotations are important!
the story can be in the ‘no data’ space
Understand the implications
scaling is an editorial control
the remaining distance
to female domination
Magnitude
always vertical bars?
Magnitude
horizontal is fine
Magnitude
horizontal clustered bars
Magnitude
can I label the bars?
Magnitude
keep labels away from bar tops
table/graph hybrids that work
Change over time?
Change over time
line chart
Change over time
Estimate the value x?
X
Change over time
Problem?
X
Change over time
can I use a bar chart?
Change over time
Change over time
spot the problem
Change over time
focus on selected lines...
Change over time
small multiples
Change over time
small multiples
Change over time
small multiples
Life expectancy improvements have slowed in recent years
Slopegraphs
reduce the granularity
of the time component
Sparklines
“Intense, simple, word-sized graphics”
source: Edward Tufte
Change over time
does the y axis have to start at 0?
Change over time
use scaling like a lens for non-count data
Change over time
add context...
Change over time
add context...until the
message in the data is clear
Change over time
add context...until the
message in the data is clear
Change over time
add context...until the
message in the data is clear
Is this a lot?
What happened?
What happened?
Is this a lot?
Change over time
add context...until the
message in the data is clear
Distribution?
Single Distribution
a ‘histogram’
why do the bars in a histogram need to be so close together?
Multiple Distributions?
Multiple Distributions?
Multiple Distributions?
Multiple Distributions?
Multiple Distributions?
Multiple Distributions?
Multiple Distributions?
Multiple Distributions?
Part-to-whole?
Part-to-whole
When might a bar graph be better than a pie chart?
Part-to-whole
When might a pie chart be better than a bar graph?
Part-to-whole
the trade-off
when there are more than 5 categories
for showing small changes in the data accurately
when there in a dominant value in the data series
when there are fewer than 6 categories
Use a pie chart
Use a bar chart
Part-to-whole
other chart types
Part-to-whole
think about common horizons!
Part-to-whole
multiple charts
Part-to-whole
Correlation?
Correlation
scatterplot
Correlation
scatterplot
104
Correlation
adding extra variables
Deviation?
Deviation
bars
Ranking?
Ranking
ordered bars
Ranking
ordered bars
Ranking
ordered bars
Ranking
ordered bars
Step 1
Rank the categories from left to right if possible from the most dominant to least dominant category
Step 2
Now rank regions from the highest to lowest by the 1st category
Ranking
ordered bars
Ranking
ordered bars
Ranking
ordered bars
Further study
Exercise
The visual variables in action - discussion in groups
magnitude
deviation
change over time
correlation
distribution
spatial
ranking
part-to-whole
uncertainty
magnitude
change over time
part-to-whole
ranking
spatial
magnitude
change over time
part-to-whole
ranking
spatial
magnitude
change over time
part-to-whole
ranking
spatial
magnitude
deviation
change over time
correlation
distribution
spatial
ranking
part-to-whole
uncertainty
deviation
ranking
magnitude
deviation
ranking
Annotations
Highlighting patterns and stories in a data set
Visualising Uncertainty
These are estimates.
They are our best guess, but they have confidence intervals.
Is this OK?
Fruit eaten by Danny and Lisa this year
What about now?
Estimates of fruit eaten by Danny and Lisa this year
(including 95% confidence intervals)
Try to give simple explanations…
Estimates of fruit eaten by Danny and Lisa this year
We are 95% certain the number lies between these values
Estimates of fruit eaten by Danny and Lisa this year
Can you do this?
Estimates of fruit eaten by Danny and Lisa this year
Can you do this?
We are 95% certain the number lies within this range
2011
50
55
60
65
70
75
80
2010
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
We are 95% certain the number lies within this range
Our best estimate
Historical data
Projected data
2011
50
55
60
65
70
75
80
2010
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
We are 95% certain the number lies within this range
Historical data
Projected data
Is this OK too?
So where did people look?
Human recognizable objects (pictures/icons) on charts can hurt visual understanding…
Use them sparingly and wisely – are they doing a job?
Human recognizable objects (pictures/icons) on charts can hurt visual understanding…
Use them sparingly and wisely – are they doing a job?
Tone down secondary elements
Minimise the ‘data to ink’ ratio
Tone down secondary elements
Minimise the ‘data to ink’ ratio
Tone down secondary elements
Minimise the ‘data to ink’ ratio
Dual-axis charts
Should we be using them?
What conclusions would you draw from this chart?
“Figure 4 shows upward trends for both female and male part-time employment. Female part-time employment had a slightly stronger upward trend than male part-time employment. The figure also shows that there were more female than male part-time workers.”
Like the author did?
“Figure 4 shows upward trends for both female and male part-time employment. Female part-time employment had a slightly stronger upward trend than male part-time employment. The figure also shows that there were more female than male part-time workers.”
| Women | Men |
Absolute Change | 547 | 624 |
Percentage change | 9.7% | 39.1% |
“
Saying something with a chart is not so different to saying it in text - we’re just using a different vocabulary.
We would see it as a glaring mistake to say “10 is bigger than 100” in writing, yet with dual axis charts we often say this visually.
Employment and unemployment rates
UK, seasonally adjusted, January to March 2006 to September to November 2018
Source: Labour market economic commentary: January 2019, ONS
Not a great chart?
Employment and unemployment rates
UK, seasonally adjusted, January to March 2006 to September to November 2018
Source: Labour market economic commentary: January 2019, ONS
Let’s use dual axis
“The unemployment rate exceeded the employment rate around 2008.”
“Unemployment was higher than employment between 2008 and 2014.”
“The employment rate is now about three times higher than the unemployment rate.”
What conclusions would you draw from this chart?
We can use dual axis to deliberately deceive
What can we do instead?
What can we do instead?
What can we do instead?
What can we do instead?
Data in Tables
When should we use a table?
When our aim is….
When should we not use a table?
If these are your aims, a table on its own is the wrong choice!
When our aim is….
One simple table?
Many possible charts with may possible messages!!
https://flowingdata.com/2018/10/17/ask-the-question-visualize-the-answer/
Focus on difference over time
Focus on difference & distribution over time
Get an overall feel
Just a projection
Help tell the story?
Just show the difference?
Show the difference approach equality?
Show profiles and median?
Overlap profiles
Break down by age?
Colour
Roy Lichtenstein
“Colour is crucial in painting, but it is very hard to talk about. There is almost nothing you can say that holds up as a generalization, because it depends on too many factors”
Exhibit A
Exhibit A
Exhibit A
Exhibit C
Ishihara Colour Test
which colour is the least ambiguous??
The abuse of colour
standard graph (corporate colours)
red/green colour blind
grey print
Figure 2: Regional share of total GVA, 2015
Source: Office for National Statistics
Introducing colour models
What colour is this?
CMYK(74%, 71%, 64%, 87%)
RGB(255, 147, 0)
#FF9300
HTML traditionally uses hexadecimal conversions of the RGB value to define colour.
HSB(35°,100%, 100%)
introducing HSL
Use the HSL cylindrical co-ordinate system for RGB colourspace to make accessible yet functional colour palettes
H = Hue (nameable colour)
S = Saturation
L = Luminance (lightness)
Colour is meaningful
Gender
Nationality
Politics
Religion
Morality
Nature
Temperature
Principles for use of Colour
What can you do?
What can you do?
Further study
Dashboards
Priorities will change, and so should the charts��What do stakeholders need to know? Are we displaying too much data?��Do we need interactivity? Can we take that burden from the user is some cases?��Are we leaving room for the “so what?”��Leaving a legacy of work. How long will the product take to update?�
The ONS approach��Running off API – very little burden of updating��Work well for broad topics – not too specific��Not for citizen users��Don’t stop there – push for more articles based on current priorities�
Function & Aesthetics
A simple survey
Q. Are you happy with the customer service you received today?
A. YES/NO
RESULTS: 85% said YES 15% said NO
TOTAL pixels: 76,523
pixels ‘NO’: 11,579
=15%
TOTAL pixels: 61,621
pixels ‘NO’: 7,557
TOTAL pixels: 29,002
pixels ‘NO’: 2,154
=12%
=7%
Function & Aesthetics
Bateman et al
“...people‘s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better.”
“Although we are cautious about recommending that all charts be produced in this style, our results question some of the premises of the minimalist approach to chart design.
Use regional maps when there is a clear spatial pattern
Spatial
What colours should I use?
When would you use
a sequential palette?
Spatial
When would you use
a diverging palette?
What colours should I use?
Spatial
Alternatives?
Spatial
What if Chart builder can’t create your chart?
What’s wrong with a PNG?
What’s the future for chart-builder
A bit about my team
A bit about my team
Data Journalists
Usually with a journalism/newsroom background but have a familiarity with data.
They help us a great deal with good written communication and turning academic style bulletins into meaningful and engaging content accessible to all.
A bit about my team
Social media experts
Managing promotion of our content via twitter / facebook / gov-delivery etc. to maximise reach and feed back any user insight.
A bit about my team
Design experts
Usually with graphic design backgrounds.
They help our content look good by providing us with blueprints at the start of projects, have knowledge around interaction design and best practice and produce graphic content for social media
A bit about my team
Data vis experts / coders
This is the group I work in as standards lead.
We have statistical backgrounds and some data journalism backgrounds and code any bespoke charts, maps and interactive content to showcase and communicate ONS statistics