1 of 44

Data literacy

Making sense of stats

2 of 44

Giving life to numbers

X	A	B	C
1	20
1.1	0
1.2	7
1.5	10
1.8	7
2	0
2.2	1	5	5
2.5	2.5	10	1
2.8	4	9	0
3	5	5	0
3.1			0
3.2			2
3.3			20
3.4			4
3.5			1
3.6			0
3.7			2
3.8			20
3.9			4
4			1
4.1			0
4.2	5	5	6
4.3	3	7	8
4.4	2	8	9
4.8	0	10	10
5.3	1		8
5.5	5		5
5.6
6	0	5
6.1	1	20

3 of 44

What we will look into today…

How statistics can be presented misleadingly:

Research in the media
Why the numbers used make a difference?
How can graphs be misleading?
Correlations vs. causation

Break

How statistics themselves can be misleading:

How research methods influence statistics
What is statistical significance and what does it tell us?
What better practices could be used?

4 of 44

Research in the media

Eating marmite could help prevent dementia
Scientists discover way to say ‘I love you’ to dogs in a way they understand
If you snore you could be three times more likely to die of coronavirus, docs warn
Don't laugh, but a good giggle can help you live longer... Research finds laughing can help to ease symptoms of heart disease
Too much caffeine ‘can wake the dead’

Which are the real headlines?

5 of 44

Eating Marmite could help prevent dementia

6 of 44

Scientists discover way to say ‘I love you’ to dogs in way they understand

7 of 44

If you snore you could be THREE TIMES more likely to die of coronavirus, docs warn

8 of 44

Don't laugh, but a good giggle can help you live longer...

9 of 44

Too much caffeine ‘can wake the dead’

10 of 44

Statistical evidence is seen as more persuasive…

11 of 44

… and people do tend to exaggerate numbers in line with their beliefs

12 of 44

The different ways statistics can be presented quick quiz

13 of 44

Number format can also exaggerate effects…

14 of 44

Number format can also exaggerate effects…

15 of 44

Number format can also exaggerate effects…

16 of 44

Number format

25 people prefer mountains	75 people prefer beaches
25%	75%
0.25	0.75
One in four	Three in four
1/4	3/4
A quarter	Three quarters
1:4	3:4
Three times more people prefer beaches to mountains
Only a third of the number of people who prefer beaches prefer mountains
300% more people prefer beaches to mountains

17 of 44

Percentage or percentage points

If the number of people who own dogs increased from 12% in 2018 to 36% in 2023…

The number did not increase by 24%

It increased by 300%

As 36% is three times bigger than 12%

However there was an increase of 24% percentage points

18 of 44

Dubious election graphs…

19 of 44

We need to talk about 2019

20 of 44

Do we need to talk about 2019?

21 of 44

Context is everything

22 of 44

Carter Racing (Brittain & Sitkin, 1987)

7 engine failures in 24 races (29%)

CC BY-SA Rainmaker47

“John Carter has only an hour to decide. The most important auto race of the season is looming; it will be broadcast live on national television and could bring major prize money. If his team wins, it will get a sponsorship deal and a chance to start making some real profits for a change.

There’s just one problem. In seven of the past twenty-four races, the engine in the Carter Racing car has blown out. An engine failure live on TV will jeopardize sponsorships—and the driver’s life. But withdrawing has consequences, too. The wasted entry fee means finishing the season in debt, and the team won’t be happy about the missed opportunity for glory. As Burns’s First Law of Racing says, “Nobody ever won a race sitting in the pits.”

One of the engine mechanics has a hunch about what’s causing the blowouts. He thinks that the engine’s head gasket might be breaking in cooler weather. To help Carter decide what to do, a graph is devised that shows the conditions during each of the blowouts: the outdoor temperature at the time of the race plotted against the number of breaks in the head gasket.

The upcoming race is forecast to be especially cold, just forty degrees [4.4 deg], well below anything the cars have experienced before. So: race or withdraw?”

Hannah Fry (2021), “When Graphs are a Matter of Life and Death”, The New Yorker

A version of Brittain & Sitkin (1987)’s “Carter Racing”; itself based on NASA data on o-ring failures prior to the January 1986 Challenger launch.

23 of 44

Incidents by temperature

24 of 44

Adding the missing data

25 of 44

In reality…

They raced. 🍎

26 of 44

Space launches influence the awarding of sociology doctorates

tylervigen.com

Correlation: 78.92% (r=0.78915)

Data sources: Federal Aviation Administration and National Science Foundation

27 of 44

Nic Cage films influence pool drownings

tylervigen.com

Correlation: 66.6% (r=0.666004)

Data sources: Centers for Disease Control & Prevention and Internet Movie Database

28 of 44

Finding unusually correlated data

Google Trends

Try to find two seemingly unrelated search terms that over the past 12 months appear to be closely correlated

Try “Shark” and “Hat”

trends.google.com

29 of 44

Spreadsheet fails

30 of 44

92 out of 97 lecturers eat catfood

Why 97?

Who was surveyed?

How much catfood are they actually eating?

31 of 44

WEIRD Samples

White, Educated, Industrialized, Rich, Democratic

80 percent of social and behavioural participants are WEIRD, but only 12 percent of the world’s population are

There can be limits in place (funding restrictions mostly) that influence accessibility to diverse samples.

Important to replicate studies

32 of 44

Sample size matters…

A small sample size is unlikely to be representative

In the population most variables will be normally distributed

Central limit theorem - the �more participants in the �sample the closer the�distribution will be to normal

A sample that is normally distributed allows you to run parametric tests which are more likely to detect effects

33 of 44

Sample size matters…

A large sample size makes a statistics more persuasive

However a large sample size is more likely to return a significant result - saying there is a relationship between variables

34 of 44

Population vs. Sample

We can’t test everyone

Collect a smaller sample from the wider population

Is there is a consistent enough effect in the sample that there is a high likelihood that the same effect exists in the population?

35 of 44

Hypotheses

Null Hypothesis -That there is no pattern or differences

Alternative or Experimental Hypothesis - That there are patterns or differences

36 of 44

Significance

Statistical tests tell us if there is a significant difference/association in our sample data.

In a statistical test the calculated p-value should be p<.05 for a test to be statistically significant.

This represents allowing ourselves a 5% chance of making a false positive

37 of 44

P-value

Is a 5% chance of making a false positive claim too high?

Or is it too low?

There is debate surrounding p-values and whether the threshold should be lowered.

Is having a threshold too strict?

38 of 44

P-value

Phrases publishes papers have used to describe p-values above .05

non-insignificant result (p=0.500)

very closely brushed the limit of statistical significance (p=0.051)

a clear tendency to significance (p=0.052)

just failed significance (p=0.057)

just borderline significant (p=0.058)

just above the arbitrary level of signiﬁcance (p=0.07)

a barely detectable statistically significant difference (p=0.073)

narrowly eluded statistical significance (p=0.0789)

moderately significant (p>0.11)

non-significant in the statistical sense (p>0.05)

39 of 44

Effect sizes

A measure of the size of the pattern or differences in your sample

40 of 44

“Lies, damned lies and statistics”

33.7% of scientists admitted to questionable practices that could lead to misleading or false statistics

Data pruning and removing outliers unreasonably

P-Hacking

Complicated models will explain more - parsimony (simpler models to explain largest effect) is important

Falsifying and fabricating data

41 of 44

Conflict of interests

Studies being funded or researched with a motive in mind and conducted in a manner to achieve that motive

42 of 44

Researchers choices

Research only includes the variables, measures and methods selected by the researcher

One study gave 29 teams of analysts the same data set and asked them to find an answer to “whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players”.

Statistical analyses methods varied and there were 21 unique combinations of variables chosen to be included. 20 teams found a significant result, 9 teams did not.

43 of 44

Better practices

Better statistical practices - consider what the data looks like, dig deeper than relying on p-values alone

Diversify samples and run replications

Collate research in one area with meta-analyses and systematic reviews

Honest graphs

Check research for any conflicts of interest

44 of 44

More of this sort of thing…

X	A	B	C
1	20
1.1	0
1.2	7
1.5	10
1.8	7
2	0
2.2	1	5	5
2.5	2.5	10	1
2.8	4	9	0
3	5	5	0
3.1			0
3.2			2
3.3			20
3.4			4
3.5			1
3.6			0
3.7			2
3.8			20
3.9			4
4			1
4.1			0
4.2	5	5	6
4.3	3	7	8
4.4	2	8	9
4.8	0	10	10
5.3	1		8
5.5	5		5
5.6
6	0	5
6.1	1	20

X	A	B	C
1	20
1.1	0
1.2	7
1.5	10
1.8	7
2	0
2.2	1	5	5
2.5	2.5	10	1
2.8	4	9	0
3	5	5	0
3.1			0
3.2			2
3.3			20
3.4			4
3.5			1
3.6			0
3.7			2
3.8			20
3.9			4
4			1
4.1			0
4.2	5	5	6
4.3	3	7	8
4.4	2	8	9
4.8	0	10	10
5.3	1		8
5.5	5		5
5.6
6	0	5
6.1	1	20

X	A	B	C
1	20
1.1	0
1.2	7
1.5	10
1.8	7
2	0
2.2	1	5	5
2.5	2.5	10	1
2.8	4	9	0
3	5	5	0
3.1			0
3.2			2
3.3			20
3.4			4
3.5			1
3.6			0
3.7			2
3.8			20
3.9			4
4			1
4.1			0
4.2	5	5	6
4.3	3	7	8
4.4	2	8	9
4.8	0	10	10
5.3	1		8
5.5	5		5
5.6
6	0	5
6.1	1	20