Why you should care about statistics
Jeff Leek
jtleek.com/talks
About me
@jtleek
www.jhudatascience.org
www.simplystatistics.org
N =
# of data points
N =
($ YOU HAVE)
($ PER SAMPLE)
Year
$ per (human) Genome
https://datasetsearch.research.google.com/
https://rstudio.cloud/
https://www.thecut.com/2013/01/hillary-most-poisoned-baby-name-in-us-history.html
We are all statisticians now
https://www.economist.com/briefing/2020/02/29/covid-19-is-now-in-50-countries-and-things-will-get-worse
https://fivethirtyeight.com/features/why-statistics-dont-capture-the-full-extent-of-the-systemic-bias-in-policing/
https://emilyoster.substack.com/p/grandparents-and-day-care
https://emilyoster.substack.com/p/grandparents-and-day-care
https://twitter.com/AmihaiGlazer/status/1277769775855235072
https://xkcd.com/1725/
http://goo.gl/HP69Rb
http://goo.gl/mTmiK2
https://fivethirtyeight.com/features/the-case-against-early-cancer-detection/
What is statistics?
Statistics is the science of learning generalizable knowledge from data
“
”
What is statistics?
What is statistics?
Question types
Did they
summarize
the data?
Did they report the summaries without interpretation?
Not a data analysis
Descriptive
Exploratory
Are they trying to predict measurement(s) for individuals?
Did they quantify whether the discoveries are likely to hold in a new sample?
Are they trying to figure out how changing the average of one measurement affects another?
Is the effect they are looking for an average effect or a deterministic effect?
Inferential
Predictive
Causal
Mechanistic
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Average
Deterministic
http://science.sciencemag.org/content/347/6228/1314
Population
10 green meeple
2 green meeple in bow ties
4 purple meeple
2 purple meeple in bow ties
https://www.census.gov/history/pdf/2010questionnaire.pdf
Did they
summarize
the data?
Did they report the summaries without interpretation?
Not a data analysis
Descriptive
Exploratory
Are they trying to predict measurement(s) for individuals?
Did they quantify whether the discoveries are likely to hold in a new sample?
Are they trying to figure out how changing the average of one measurement affects another?
Is the effect they are looking for an average effect or a deterministic effect?
Inferential
Predictive
Causal
Mechanistic
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Average
Deterministic
http://science.sciencemag.org/content/347/6228/1314
Population
Probability
Green meeple are mostly wearing bow ties and purple ones are not
https://www.businessinsider.com/chocolate-consumption-vs-nobel-prizes-2014-4
Did they
summarize
the data?
Did they report the summaries without interpretation?
Not a data analysis
Descriptive
Exploratory
Are they trying to predict measurement(s) for individuals?
Did they quantify whether the discoveries are likely to hold in a new sample?
Are they trying to figure out how changing the average of one measurement affects another?
Is the effect they are looking for an average effect or a deterministic effect?
Inferential
Predictive
Causal
Mechanistic
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Average
Deterministic
http://science.sciencemag.org/content/347/6228/1314
Population
Probability
Inference
Between 80% and 100% of all bow tie wearers are green.
https://projects.fivethirtyeight.com/polls/president-general/national/
Did they
summarize
the data?
Did they report the summaries without interpretation?
Not a data analysis
Descriptive
Exploratory
Are they trying to predict measurement(s) for individuals?
Did they quantify whether the discoveries are likely to hold in a new sample?
Are they trying to figure out how changing the average of one measurement affects another?
Is the effect they are looking for an average effect or a deterministic effect?
Inferential
Predictive
Causal
Mechanistic
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Average
Deterministic
http://science.sciencemag.org/content/347/6228/1314
Population
Probability
f( ) = green
f( ) = purple
Prediction
If I see a new bow tie-wearer I predict that they will be green
https://covid19-projections.com/
Did they
summarize
the data?
Did they report the summaries without interpretation?
Not a data analysis
Descriptive
Exploratory
Are they trying to predict measurement(s) for individuals?
Did they quantify whether the discoveries are likely to hold in a new sample?
Are they trying to figure out how changing the average of one measurement affects another?
Is the effect they are looking for an average effect or a deterministic effect?
Inferential
Predictive
Causal
Mechanistic
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Average
Deterministic
http://science.sciencemag.org/content/347/6228/1314
Population
Probability
Causal
Inference
If I convert a bow tie-wearer to not a bow tie-wearer they will become purple 30% of the time.
https://www.mailmunch.com/blog/ab-testing-got-obama-60-million/
40% more signups on average
Did they
summarize
the data?
Did they report the summaries without interpretation?
Not a data analysis
Descriptive
Exploratory
Are they trying to predict measurement(s) for individuals?
Did they quantify whether the discoveries are likely to hold in a new sample?
Are they trying to figure out how changing the average of one measurement affects another?
Is the effect they are looking for an average effect or a deterministic effect?
Inferential
Predictive
Causal
Mechanistic
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Average
Deterministic
http://science.sciencemag.org/content/347/6228/1314
Population
Probability
Mechanistic
Inference
If I convert a bow tie-wearer to not a bow tie-wearer they will become purple 100% of the time.
Did they
summarize
the data?
Did they report the summaries without interpretation?
Not a data analysis
Descriptive
Exploratory
Are they trying to predict measurement(s) for individuals?
Did they quantify whether the discoveries are likely to hold in a new sample?
Are they trying to figure out how changing the average of one measurement affects another?
Is the effect they are looking for an average effect or a deterministic effect?
Inferential
Predictive
Causal
Mechanistic
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Average
Deterministic
http://science.sciencemag.org/content/347/6228/1314
https://xkcd.com/552/
Did they
summarize
the data?
Did they report the summaries without interpretation?
Not a data analysis
Descriptive
Exploratory
Are they trying to predict measurement(s) for individuals?
Did they quantify whether the discoveries are likely to hold in a new sample?
Are they trying to figure out how changing the average of one measurement affects another?
Is the effect they are looking for an average effect or a deterministic effect?
Inferential
Predictive
Causal
Mechanistic
No
No
No
No
No
Yes
Yes
Yes
Yes
Yes
Average
Deterministic
http://science.sciencemag.org/content/347/6228/1314
What is statistics?
Confounding
Small shoes
Not literate
Big shoes
Somewhat literate
Shoe Size
Literacy
??
Small shoes
Not literate
Young
Big shoes
Somewhat literate
Middle aged
Shoe Size
Literacy
Age
Shoe Size
Literacy
Age
Variable1
Variable2
Confounder
Population
Probability
f( ) = purple
f( ) = green
Prediction
If I see a new bow tie-wearer I predict that they will be purple
Population
Biased
Probability
f( ) = green
f( ) = purple
Prediction
If I see a new bow tie-wearer I predict that they will be green!
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing�
What is statistics?
http://www.jstor.org/stable/2288400
https://www.autodeskresearch.com/publications/samestats
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128
https://www.biostat.wisc.edu/~kbroman/presentations/IowaState2013/graphs_combined.pdf
https://www.buzzfeednews.com/article/katienotopoulos/graphs-that-lied-to-us
What is statistics?
https://fivethirtyeight.com/features/billion-dollar-billy-beane/
What is statistics?
https://fivethirtyeight.com/
What is statistics?
Next Steps
https://leanpub.com/universities/set/jhu/cloud-based-data-science
Thanks and have fun!