1 of 77

Why you should care about statistics

Jeff Leek

2 of 77

jtleek.com/talks

3 of 77

About me

4 of 77

@jtleek

5 of 77

www.jhudatascience.org

6 of 77

www.simplystatistics.org

7 of 77

N =

# of data points

8 of 77

N =

($ YOU HAVE)

($ PER SAMPLE)

9 of 77

Year

$ per (human) Genome

10 of 77

11 of 77

https://datasetsearch.research.google.com/

12 of 77

https://rstudio.cloud/

13 of 77

https://www.thecut.com/2013/01/hillary-most-poisoned-baby-name-in-us-history.html

14 of 77

We are all statisticians now

15 of 77

https://www.economist.com/briefing/2020/02/29/covid-19-is-now-in-50-countries-and-things-will-get-worse

16 of 77

https://fivethirtyeight.com/features/why-statistics-dont-capture-the-full-extent-of-the-systemic-bias-in-policing/

17 of 77

https://emilyoster.substack.com/p/grandparents-and-day-care

18 of 77

https://emilyoster.substack.com/p/grandparents-and-day-care

19 of 77

https://twitter.com/AmihaiGlazer/status/1277769775855235072

20 of 77

https://xkcd.com/1725/

21 of 77

http://goo.gl/HP69Rb

22 of 77

http://goo.gl/mTmiK2

23 of 77

https://fivethirtyeight.com/features/the-case-against-early-cancer-detection/

24 of 77

What is statistics?

25 of 77

Statistics is the science of learning generalizable knowledge from data

26 of 77

What is statistics?

  1. Study design
  2. Critical thinking
  3. Exploratory analysis
  4. Statistical modeling
  5. Communication

27 of 77

What is statistics?

  • Study design
  • Critical thinking
  • Exploratory analysis
  • Statistical modeling
  • Communication

28 of 77

Question types

29 of 77

Did they

summarize

the data?

Did they report the summaries without interpretation?

Not a data analysis

Descriptive

Exploratory

Are they trying to predict measurement(s) for individuals?

Did they quantify whether the discoveries are likely to hold in a new sample?

Are they trying to figure out how changing the average of one measurement affects another?

Is the effect they are looking for an average effect or a deterministic effect?

Inferential

Predictive

Causal

Mechanistic

No

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Average

Deterministic

http://science.sciencemag.org/content/347/6228/1314

30 of 77

Population

10 green meeple

2 green meeple in bow ties

4 purple meeple

2 purple meeple in bow ties

31 of 77

https://www.census.gov/history/pdf/2010questionnaire.pdf

32 of 77

Did they

summarize

the data?

Did they report the summaries without interpretation?

Not a data analysis

Descriptive

Exploratory

Are they trying to predict measurement(s) for individuals?

Did they quantify whether the discoveries are likely to hold in a new sample?

Are they trying to figure out how changing the average of one measurement affects another?

Is the effect they are looking for an average effect or a deterministic effect?

Inferential

Predictive

Causal

Mechanistic

No

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Average

Deterministic

http://science.sciencemag.org/content/347/6228/1314

33 of 77

Population

Probability

Green meeple are mostly wearing bow ties and purple ones are not

34 of 77

https://www.businessinsider.com/chocolate-consumption-vs-nobel-prizes-2014-4

35 of 77

Did they

summarize

the data?

Did they report the summaries without interpretation?

Not a data analysis

Descriptive

Exploratory

Are they trying to predict measurement(s) for individuals?

Did they quantify whether the discoveries are likely to hold in a new sample?

Are they trying to figure out how changing the average of one measurement affects another?

Is the effect they are looking for an average effect or a deterministic effect?

Inferential

Predictive

Causal

Mechanistic

No

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Average

Deterministic

http://science.sciencemag.org/content/347/6228/1314

36 of 77

Population

Probability

Inference

Between 80% and 100% of all bow tie wearers are green.

37 of 77

https://projects.fivethirtyeight.com/polls/president-general/national/

38 of 77

Did they

summarize

the data?

Did they report the summaries without interpretation?

Not a data analysis

Descriptive

Exploratory

Are they trying to predict measurement(s) for individuals?

Did they quantify whether the discoveries are likely to hold in a new sample?

Are they trying to figure out how changing the average of one measurement affects another?

Is the effect they are looking for an average effect or a deterministic effect?

Inferential

Predictive

Causal

Mechanistic

No

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Average

Deterministic

http://science.sciencemag.org/content/347/6228/1314

39 of 77

Population

Probability

f( ) = green

f( ) = purple

Prediction

If I see a new bow tie-wearer I predict that they will be green

40 of 77

https://covid19-projections.com/

41 of 77

Did they

summarize

the data?

Did they report the summaries without interpretation?

Not a data analysis

Descriptive

Exploratory

Are they trying to predict measurement(s) for individuals?

Did they quantify whether the discoveries are likely to hold in a new sample?

Are they trying to figure out how changing the average of one measurement affects another?

Is the effect they are looking for an average effect or a deterministic effect?

Inferential

Predictive

Causal

Mechanistic

No

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Average

Deterministic

http://science.sciencemag.org/content/347/6228/1314

42 of 77

Population

Probability

Causal

Inference

If I convert a bow tie-wearer to not a bow tie-wearer they will become purple 30% of the time.

43 of 77

https://www.mailmunch.com/blog/ab-testing-got-obama-60-million/

40% more signups on average

44 of 77

Did they

summarize

the data?

Did they report the summaries without interpretation?

Not a data analysis

Descriptive

Exploratory

Are they trying to predict measurement(s) for individuals?

Did they quantify whether the discoveries are likely to hold in a new sample?

Are they trying to figure out how changing the average of one measurement affects another?

Is the effect they are looking for an average effect or a deterministic effect?

Inferential

Predictive

Causal

Mechanistic

No

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Average

Deterministic

http://science.sciencemag.org/content/347/6228/1314

45 of 77

Population

Probability

Mechanistic

Inference

If I convert a bow tie-wearer to not a bow tie-wearer they will become purple 100% of the time.

46 of 77

Did they

summarize

the data?

Did they report the summaries without interpretation?

Not a data analysis

Descriptive

Exploratory

Are they trying to predict measurement(s) for individuals?

Did they quantify whether the discoveries are likely to hold in a new sample?

Are they trying to figure out how changing the average of one measurement affects another?

Is the effect they are looking for an average effect or a deterministic effect?

Inferential

Predictive

Causal

Mechanistic

No

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Average

Deterministic

http://science.sciencemag.org/content/347/6228/1314

47 of 77

https://xkcd.com/552/

48 of 77

Did they

summarize

the data?

Did they report the summaries without interpretation?

Not a data analysis

Descriptive

Exploratory

Are they trying to predict measurement(s) for individuals?

Did they quantify whether the discoveries are likely to hold in a new sample?

Are they trying to figure out how changing the average of one measurement affects another?

Is the effect they are looking for an average effect or a deterministic effect?

Inferential

Predictive

Causal

Mechanistic

No

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Average

Deterministic

http://science.sciencemag.org/content/347/6228/1314

49 of 77

What is statistics?

  • Study design
  • Critical thinking
  • Exploratory analysis
  • Statistical modeling
  • Communication

50 of 77

Confounding

51 of 77

Small shoes

Not literate

Big shoes

Somewhat literate

52 of 77

Shoe Size

Literacy

??

53 of 77

Small shoes

Not literate

Young

Big shoes

Somewhat literate

Middle aged

54 of 77

Shoe Size

Literacy

Age

55 of 77

Shoe Size

Literacy

Age

56 of 77

Variable1

Variable2

Confounder

57 of 77

Population

Probability

f( ) = purple

f( ) = green

Prediction

If I see a new bow tie-wearer I predict that they will be purple

58 of 77

Population

Biased

Probability

f( ) = green

f( ) = purple

Prediction

If I see a new bow tie-wearer I predict that they will be green!

59 of 77

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing�

60 of 77

What is statistics?

  • Study design
  • Critical thinking
  • Exploratory analysis
  • Statistical modeling
  • Communication

61 of 77

http://www.jstor.org/stable/2288400

62 of 77

https://www.autodeskresearch.com/publications/samestats

63 of 77

http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128

64 of 77

https://www.biostat.wisc.edu/~kbroman/presentations/IowaState2013/graphs_combined.pdf

65 of 77

https://www.buzzfeednews.com/article/katienotopoulos/graphs-that-lied-to-us

66 of 77

What is statistics?

  • Study design
  • Critical thinking
  • Exploratory analysis
  • Statistical modeling
  • Communication

67 of 77

68 of 77

https://fivethirtyeight.com/features/billion-dollar-billy-beane/

69 of 77

70 of 77

71 of 77

What is statistics?

  • Study design
  • Critical thinking
  • Exploratory analysis
  • Statistical modeling
  • Communication

72 of 77

https://fivethirtyeight.com/

73 of 77

74 of 77

What is statistics?

  • Study design
  • Critical thinking
  • Exploratory analysis
  • Statistical modeling
  • Communication

75 of 77

Next Steps

76 of 77

https://leanpub.com/universities/set/jhu/cloud-based-data-science

77 of 77

Thanks and have fun!