1 of 238

An Introduction to Data Visualisation

Frank Donnarumma @frankman1000 francis.donnarumma@ons.gov.uk�Peter Broad @BroadPeter

peter.broad@ons.gov.uk

2 of 238

About us…

  • ONS Data Visualisation Standards Leads

  • Produce content for ONS outputs

About you…

What data do you work with?

What tools do you use?

Who is your audience?

What would you like out of today?

3 of 238

What’s the most popular ONS output?

4 of 238

http://www.flickr.com/photos/8314403@N03/2620922750/ Happiness

Kevin?

Oliver?

Jack?

Harry?

Alfie?

Charlie?

Thomas?

William?

Joshua?

George?

James?

5 of 238

6 of 238

http://visual.ons.gov.uk/baby-names/

7 of 238

8 of 238

Introduction

9 of 238

Aims

  • Think about and understand the information content of your graphs through the eyes of your reader
  • Provide an understanding of the basic design principles you need to convey the message you intended in a static image.
  • (The ability to critique a chart / infographic)
  • An overview of where interactive approaches can be useful

10 of 238

Why visualise data?

  • To give a fast overview or summary of a dataset
  • Communicate memorable or important stories in a dataset(s)
  • To reveal insight (that would otherwise be hidden)
  • For quality assurance and error detection

11 of 238

Why visualise data?

12 of 238

Historical Perspective

13 of 238

Statistics in the 19th Century

14 of 238

William Playfair (1759-1823)

15 of 238

William Playfair (1759-1823)

16 of 238

Florence Nightingale (1820-1910)

Blue: preventible or mitigable diseases

Red: deaths from wounds

Black: deaths from other causes

17 of 238

Frank Anscombe (1918-2001)

18 of 238

And then something happened...

19 of 238

20 of 238

21 of 238

A call from history

Focus on best practice...

        • Principles NOT rules
        • Context NOT constraint
        • Consistency NOT regimentation

Stop being technology-led…

...be technology-enabled

22 of 238

Further study

23 of 238

Societal Perspective

24 of 238

How many people do you think access the ONS website from home or on a mobile phone?

?%

25 of 238

How many people do you think access the ONS website from home or on a mobile phone?

48%

26 of 238

Numeracy and Statistical Literacy

In 2003, 46.9% of working age adults in England lacked Level 1 numeracy skills.

In 2011, 49.1%of working age adults in England lacked Level 1 numeracy skills.

Level 1 numeracy skills: calculating simple percentages and converting units of measure.

For example adults without Level 1 skills may not be able to understand their payslip .

27 of 238

Numeracy is a societal barrier

“I phoned Camelot and they fobbed me off with some story that -6 is higher, not lower, than -8, but I’m not having it”�Tina Farrel, 23, Manchester

28 of 238

Wrong Said Fred!

29 of 238

30 of 238

Bulletins take

9.5x

longer to read than people actually spend on the page

31 of 238

53%

of people who land on a bulletin page leave the site immediately

32 of 238

9Is the average UK reading age

15

Is the average reading age for our content

33 of 238

Users are much more likely to search �for terms we don’t use in our bulletin

34 of 238

Further study

35 of 238

Symbology

Statistical

Relationships

36 of 238

Charts!

Perspectives on or stories in the data

37 of 238

A limited number of coding strategies…

  • Text
  • Height/Width
  • Size/Area
  • Colour
  • Texture
  • Orientation
  • Curvature
  • Shape
  • Position (x,y)
  • Motion

“There are 1,900 people in Warnford…”

Context-dependent

38 of 238

Caution: Interpreting visual variables

39 of 238

Example: Length v Area

The most effective graphs make the most efficient use of visual encoding

x5.2

x2

40 of 238

Questions to ask yourself before producing a chart

41 of 238

Is your chart exploratory or explanatory?

Exploratory charts are what you might produce to get familiar with the data. You may not be starting out with a hypothesis, question or point to make, and may just be aiming to display all of the data in a graphic for a general picture. From this, you or your user could spot interesting points to delve into further.

Explanatory charts are what we produce when you have something specific you want to show an audience - in this case, you have a hypothesis, question or point to make. We should have done the time-consuming work for the user already and have picked out the stories of relevance and interest.

In our charting and writing for statistical bulletins, the majority of the focus should be on explanatory analysis.

42 of 238

What is the specific purpose of your chart?

By producing this chart, what further understanding or insight are you hoping to communicate to your user? Is it crucial to your story? Are you adding context? Understanding this will help you choose the right chart and could also help you tell your story more efficiently.

43 of 238

What data needs to be on the chart to achieve your aims?

When talking about a specific variable or aspect of the data, we don’t always need to show every variable or observation, just as long as we aren’t removing context. An example of this may be a line chart with more than five lines, where the story is focusing on a single series. You might consider using colour for the line of importance and make the other lines grey to provide context?

44 of 238

Text, tables, graphs and maps represent a toolkit for statistical communication.

Being able to spot when and where to use each is an important component in producing quality content, tailored to your audience.

45 of 238

Deviation

Correlation

Ranking

Distribution

Change over time

Magnitude

Part-to-whole

Spatial

Flow

46 of 238

Choosing a graph type - ‘safety first’

  • Graphs display information about relationships in data
  • Getting to the right graph is a two-step process:
    1. identify and prioritise the statistical relationships
    2. choose the symbology that gives visual emphasis to the highest priority relationships

So what basic statistical relationships in the data should we look for?

47 of 238

Let’s try and match some charts to the statistical relationships they describe

Deviation

Correlation

Ranking

Distribution

Change over time

Magnitude

Part-to-whole

Spatial

Flow

48 of 238

49 of 238

Magnitude?

50 of 238

Magnitude

does the y axis have to start at 0?

51 of 238

Magnitude

does the x axis have to start at 0?

52 of 238

Magnitude

bar alternative

53 of 238

A perfect graph?

anything wrong?

54 of 238

White space is important!

the story can be in the ‘no data’ space

the remaining distance

to equality

55 of 238

White space, title and annotations are important!

the story can be in the ‘no data’ space

56 of 238

Understand the implications

scaling is an editorial control

the remaining distance

to female domination

57 of 238

Magnitude

always vertical bars?

58 of 238

Magnitude

horizontal is fine

59 of 238

Magnitude

horizontal clustered bars

60 of 238

Magnitude

can I label the bars?

61 of 238

Magnitude

keep labels away from bar tops

62 of 238

table/graph hybrids that work

63 of 238

Change over time?

64 of 238

Change over time

line chart

65 of 238

Change over time

Estimate the value x?

X

66 of 238

Change over time

Problem?

X

67 of 238

Change over time

can I use a bar chart?

68 of 238

Change over time

69 of 238

Change over time

spot the problem

70 of 238

Change over time

focus on selected lines...

71 of 238

Change over time

small multiples

72 of 238

Change over time

small multiples

73 of 238

Change over time

small multiples

Life expectancy improvements have slowed in recent years

74 of 238

Slopegraphs

reduce the granularity

of the time component

75 of 238

Sparklines

“Intense, simple, word-sized graphics”

source: Edward Tufte

76 of 238

Change over time

does the y axis have to start at 0?

77 of 238

Change over time

use scaling like a lens for non-count data

78 of 238

Change over time

add context...

79 of 238

Change over time

add context...until the

message in the data is clear

80 of 238

Change over time

add context...until the

message in the data is clear

81 of 238

Change over time

add context...until the

message in the data is clear

Is this a lot?

What happened?

What happened?

Is this a lot?

82 of 238

Change over time

add context...until the

message in the data is clear

83 of 238

Distribution?

84 of 238

Single Distribution

a ‘histogram’

why do the bars in a histogram need to be so close together?

85 of 238

Multiple Distributions?

86 of 238

Multiple Distributions?

87 of 238

Multiple Distributions?

88 of 238

Multiple Distributions?

89 of 238

Multiple Distributions?

90 of 238

Multiple Distributions?

91 of 238

Multiple Distributions?

92 of 238

Multiple Distributions?

93 of 238

Part-to-whole?

94 of 238

Part-to-whole

When might a bar graph be better than a pie chart?

95 of 238

Part-to-whole

When might a pie chart be better than a bar graph?

96 of 238

Part-to-whole

the trade-off

when there are more than 5 categories

for showing small changes in the data accurately

when there in a dominant value in the data series

when there are fewer than 6 categories

Use a pie chart

Use a bar chart

97 of 238

Part-to-whole

other chart types

98 of 238

Part-to-whole

think about common horizons!

99 of 238

Part-to-whole

multiple charts

100 of 238

Part-to-whole

101 of 238

Correlation?

102 of 238

Correlation

scatterplot

103 of 238

Correlation

scatterplot

104 of 238

104

105 of 238

Correlation

adding extra variables

106 of 238

Deviation?

107 of 238

Deviation

bars

108 of 238

Ranking?

109 of 238

Ranking

ordered bars

110 of 238

Ranking

ordered bars

111 of 238

Ranking

ordered bars

112 of 238

Ranking

ordered bars

Step 1

Rank the categories from left to right if possible from the most dominant to least dominant category

Step 2

Now rank regions from the highest to lowest by the 1st category

113 of 238

Ranking

ordered bars

114 of 238

Ranking

ordered bars

115 of 238

Ranking

ordered bars

116 of 238

Further study

  • Few, S. (2012 2nd ed). Show Me the Numbers: Designing Tables and Graphs to Enlighten. Analytics Press
  • Tufte, E.R. (1985). The Visual Display of Quantitative Information. Graphics Press.
  • Tufte, E.R. (1997). Visual Explanations. Graphics Press.
  • Tufte, E.R. (2006). Beautiful Evidence.
  • Bertin, J (1967). Sémiologie Graphique. Les diagrammes, les réseaux, les cartes. With Marc Barbut [et al.]. Paris : Gauthier-Villars. (Translation 1983. Semiology of Graphics by William J. Berg.)
  • Wilkinson, L. (2005). The Grammar of Graphics. Springer.
  • Good essay on slopegraphs. http://charliepark.org/slopegraphs/

117 of 238

Exercise

The visual variables in action - discussion in groups

  1. What are the statistical relationships being symbolised?
  2. What is the overall message of the graph?
  3. Is it ‘successful’?
  4. Can you think of any other ways of visualising some or all of the data in the graph - and what impression would it give the reader?

118 of 238

119 of 238

120 of 238

121 of 238

magnitude

deviation

change over time

correlation

distribution

spatial

ranking

part-to-whole

uncertainty

magnitude

change over time

part-to-whole

ranking

spatial

122 of 238

magnitude

change over time

part-to-whole

ranking

spatial

123 of 238

124 of 238

magnitude

change over time

part-to-whole

ranking

spatial

125 of 238

126 of 238

127 of 238

128 of 238

magnitude

deviation

change over time

correlation

distribution

spatial

ranking

part-to-whole

uncertainty

deviation

ranking

magnitude

129 of 238

deviation

ranking

130 of 238

131 of 238

132 of 238

133 of 238

134 of 238

135 of 238

136 of 238

137 of 238

138 of 238

Annotations

139 of 238

Highlighting patterns and stories in a data set

140 of 238

141 of 238

142 of 238

143 of 238

Visualising Uncertainty

144 of 238

  • Visualising uncertainty is important and can help make better decisions.

  • Don’t just present uncertainty as a footnote. Present it up-front with simple language and labelling.

145 of 238

146 of 238

147 of 238

148 of 238

These are estimates.

They are our best guess, but they have confidence intervals.

Is this OK?

Fruit eaten by Danny and Lisa this year

149 of 238

What about now?

Estimates of fruit eaten by Danny and Lisa this year

(including 95% confidence intervals)

150 of 238

Try to give simple explanations…

Estimates of fruit eaten by Danny and Lisa this year

151 of 238

We are 95% certain the number lies between these values

Estimates of fruit eaten by Danny and Lisa this year

Can you do this?

152 of 238

Estimates of fruit eaten by Danny and Lisa this year

Can you do this?

We are 95% certain the number lies within this range

153 of 238

154 of 238

155 of 238

2011

50

55

60

65

70

75

80

2010

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

We are 95% certain the number lies within this range

Our best estimate

Historical data

Projected data

156 of 238

2011

50

55

60

65

70

75

80

2010

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

We are 95% certain the number lies within this range

Historical data

Projected data

Is this OK too?

157 of 238

158 of 238

So where did people look?

159 of 238

Human recognizable objects (pictures/icons) on charts can hurt visual understanding…

Use them sparingly and wisely – are they doing a job?

160 of 238

Human recognizable objects (pictures/icons) on charts can hurt visual understanding…

Use them sparingly and wisely – are they doing a job?

161 of 238

Tone down secondary elements

Minimise the ‘data to ink’ ratio

  • No outlines around charts or keys
  • No visible line on y-axis
  • Position the key above charts so that we can maximise space on smaller screens
  • Ensure all fonts are left aligned and a dark grey, not black
  • Make gridlines a pale grey and try to have between 4 and 7.
  • Where possible, rank bars in bar charts by value when no other natural order exists
  • No vertical text
  • Well considered colours (house style? Accessible?)

162 of 238

Tone down secondary elements

Minimise the ‘data to ink’ ratio

163 of 238

Tone down secondary elements

Minimise the ‘data to ink’ ratio

164 of 238

165 of 238

166 of 238

Dual-axis charts

Should we be using them?

167 of 238

What conclusions would you draw from this chart?

“Figure 4 shows upward trends for both female and male part-time employment. Female part-time employment had a slightly stronger upward trend than male part-time employment. The figure also shows that there were more female than male part-time workers.”

168 of 238

Like the author did?

“Figure 4 shows upward trends for both female and male part-time employment. Female part-time employment had a slightly stronger upward trend than male part-time employment. The figure also shows that there were more female than male part-time workers.”

 

Women

Men

Absolute Change

547

624

Percentage change

9.7%

39.1%

169 of 238

Saying something with a chart is not so different to saying it in text - we’re just using a different vocabulary.

We would see it as a glaring mistake to say “10 is bigger than 100” in writing, yet with dual axis charts we often say this visually.

170 of 238

Employment and unemployment rates

UK, seasonally adjusted, January to March 2006 to September to November 2018

Source: Labour market economic commentary: January 2019, ONS

Not a great chart?

171 of 238

Employment and unemployment rates

UK, seasonally adjusted, January to March 2006 to September to November 2018

Source: Labour market economic commentary: January 2019, ONS

Let’s use dual axis

172 of 238

“The unemployment rate exceeded the employment rate around 2008.”

“Unemployment was higher than employment between 2008 and 2014.”

“The employment rate is now about three times higher than the unemployment rate.”

What conclusions would you draw from this chart?

173 of 238

We can use dual axis to deliberately deceive

174 of 238

What can we do instead?

175 of 238

What can we do instead?

176 of 238

What can we do instead?

177 of 238

What can we do instead?

178 of 238

Data in Tables

179 of 238

When should we use a table?

  • To allow comparison of individual data values
  • To present a very precise level of detail
  • To show multiple units of measure (i.e., n & %)
  • To present only a small number of data values
  • To present a very large number of data values
  • To show values and their sums

When our aim is….

180 of 238

181 of 238

When should we not use a table?

  • To allow comparison of multiple data values
  • To present a broad story
  • To show a single unit of measure (i.e., n & %)
  • To present a larger number of data values

If these are your aims, a table on its own is the wrong choice!

When our aim is….

182 of 238

One simple table?

183 of 238

Many possible charts with may possible messages!!

184 of 238

https://flowingdata.com/2018/10/17/ask-the-question-visualize-the-answer/

185 of 238

Focus on difference over time

186 of 238

Focus on difference & distribution over time

187 of 238

Get an overall feel

188 of 238

Just a projection

189 of 238

Help tell the story?

190 of 238

Just show the difference?

191 of 238

Show the difference approach equality?

192 of 238

Show profiles and median?

193 of 238

Overlap profiles

194 of 238

Break down by age?

195 of 238

Colour

196 of 238

Roy Lichtenstein

“Colour is crucial in painting, but it is very hard to talk about. There is almost nothing you can say that holds up as a generalization, because it depends on too many factors”

197 of 238

Exhibit A

198 of 238

Exhibit A

199 of 238

Exhibit A

200 of 238

Exhibit C

201 of 238

202 of 238

Ishihara Colour Test

  • A series of 38 test plates to diagnose colour blindness
  • Between 7-10% of males suffer from a form of red/green colour-blindness
  • Other forms of colour-blindness rarer - but can be found in females
  • Form part of W3C requirements on accessibility

203 of 238

which colour is the least ambiguous??

204 of 238

205 of 238

The abuse of colour

standard graph (corporate colours)

red/green colour blind

grey print

206 of 238

207 of 238

208 of 238

209 of 238

Figure 2: Regional share of total GVA, 2015

Source: Office for National Statistics

210 of 238

Introducing colour models

What colour is this?

CMYK(74%, 71%, 64%, 87%)

RGB(255, 147, 0)

#FF9300

HTML traditionally uses hexadecimal conversions of the RGB value to define colour.

HSB(35°,100%, 100%)

211 of 238

introducing HSL

Use the HSL cylindrical co-ordinate system for RGB colourspace to make accessible yet functional colour palettes

H = Hue (nameable colour)

S = Saturation

L = Luminance (lightness)

212 of 238

Colour is meaningful

Gender

Nationality

Politics

Religion

Morality

Nature

Temperature

213 of 238

Principles for use of Colour

  • Design everything in grey-scale (luminance contrast) and add colour as a secondary process
  • Use colour (hue) sparingly - and never on its own to specify something in the data. Mid-luminance, highly saturated colours are your highlight colours
  • Hue best for qualitative, luminance and saturation contrast best for quantitative differences.
  • The safest starting point when picking an appropriate hue is blue
  • Red and Green should rarely be seen”

214 of 238

What can you do?

215 of 238

What can you do?

216 of 238

Further study

  • Simulate colour blindness on images. http://www.etre.com/tools/colourblindsimulator
  • Colorbrewer - online color-palette generator. http://colorbrewer2.org

217 of 238

Dashboards

218 of 238

Priorities will change, and so should the charts��What do stakeholders need to know? Are we displaying too much data?��Do we need interactivity? Can we take that burden from the user is some cases?��Are we leaving room for the “so what?”��Leaving a legacy of work. How long will the product take to update?�

219 of 238

The ONS approach��Running off API – very little burden of updating��Work well for broad topics – not too specific��Not for citizen users��Don’t stop there – push for more articles based on current priorities

220 of 238

Function & Aesthetics

221 of 238

A simple survey

  • I conducted a survey which asked respondents just one closed question:

222 of 238

Q. Are you happy with the customer service you received today?

A. YES/NO

RESULTS: 85% said YES 15% said NO

TOTAL pixels: 76,523

pixels ‘NO’: 11,579

=15%

TOTAL pixels: 61,621

pixels ‘NO’: 7,557

TOTAL pixels: 29,002

pixels ‘NO’: 2,154

=12%

=7%

223 of 238

Function & Aesthetics

224 of 238

Bateman et al

“...people‘s accuracy in describing the embellished charts was no worse than for plain charts, and that their recall after a two-to-three-week gap was significantly better.”

“Although we are cautious about recommending that all charts be produced in this style, our results question some of the premises of the minimalist approach to chart design.

225 of 238

Use regional maps when there is a clear spatial pattern

Spatial

226 of 238

227 of 238

228 of 238

What colours should I use?

When would you use

a sequential palette?

Spatial

229 of 238

When would you use

a diverging palette?

What colours should I use?

Spatial

230 of 238

Alternatives?

Spatial

231 of 238

What if Chart builder can’t create your chart?

  • Firstly talk with PST and myself or one of my colleagues in Data Vis. If it’s a reasonable chart with a good enough user benefit, we can often help you.

  • That help might be suggesting a compromise of simpler charts that chart-builder can create, or it might be us building you something bespoke

  • We may not always have resource – we can get pretty busy with other priorities. In this case, we may need to use a PNG

  • PNGs should be a last resort

232 of 238

What’s wrong with a PNG?

  • PNGs won’t adapt to different screen sizes. Because of this we need to be very careful when creating them. They should be taller than they are wide, and any text should appear the same size as text in the surrounding report.

  • PNGs cannot be read by screen readers. This is only a tiny percentage of users, but it’s still something we have an obligation to address where we can.

  • It’s a bit if work to get them into house style. We can help with this but there is no published guidance as we don’t want to encourage their creation.

233 of 238

What’s the future for chart-builder

  • We know the current chart-builder is limited and frustrating. Work is underway to improve/replace it, perhaps in 2019

  • Improvements are looking to include a “sandpit” environment for business areas to create charts with their data in more complex cases and have PST re-create them

  • Improvements will expand the amount of charts available, with more non-standard chart types to become available in certain circumstances

  • Annotations will be improved and become more flexible

  • Colour will be more flexible in exceptional circumstances

234 of 238

A bit about my team

235 of 238

A bit about my team

Data Journalists

Usually with a journalism/newsroom background but have a familiarity with data.

They help us a great deal with good written communication and turning academic style bulletins into meaningful and engaging content accessible to all.

236 of 238

A bit about my team

Social media experts

Managing promotion of our content via twitter / facebook / gov-delivery etc. to maximise reach and feed back any user insight.

237 of 238

A bit about my team

Design experts

Usually with graphic design backgrounds.

They help our content look good by providing us with blueprints at the start of projects, have knowledge around interaction design and best practice and produce graphic content for social media

238 of 238

A bit about my team

Data vis experts / coders

This is the group I work in as standards lead.

We have statistical backgrounds and some data journalism backgrounds and code any bespoke charts, maps and interactive content to showcase and communicate ONS statistics