Practical principles for data analysis design
Lucy D’Agostino McGowan
Wake Forest University
lucymcgowan.com/talk
Lucy D’Agostino McGowan, Roger D. Peng & Stephanie C. Hicks (2022) Design Principles for Data Analysis, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2022.2104290
Lucy D’Agostino McGowan, Roger D. Peng & Stephanie C. Hicks (2022) Design Principles for Data Analysis, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2022.2104290
statistical thinking
statistical thinking
data
statistical thinking
data
method
statistical thinking
data
answer
https://onishlab.colostate.edu/summer-statistics-workshop-2019/which_test_flowchart/
design thinking
design thinking
empathize
Model proposed by the Hasso-Plattner Institute of Design at Stanford
design thinking
empathize
define
Model proposed by the Hasso-Plattner Institute of Design at Stanford
design thinking
empathize
define
ideate
Model proposed by the Hasso-Plattner Institute of Design at Stanford
design thinking
empathize
define
ideate
prototype
Model proposed by the Hasso-Plattner Institute of Design at Stanford
design thinking
empathize
define
ideate
prototype
test
Model proposed by the Hasso-Plattner Institute of Design at Stanford
design thinking
empathize
define
ideate
prototype
test
Model proposed by the Hasso-Plattner Institute of Design at Stanford
Why?
D’Agostino McGowan, Peng & Hicks (2022)
Why?
D’Agostino McGowan, Peng & Hicks (2022)
Why?
D’Agostino McGowan, Peng & Hicks (2022)
Why?
D’Agostino McGowan, Peng & Hicks (2022)
Producer of
Data Analysis
Consumer of
Data Analysis
Data analysis product
Data analysis evaluation
Producer of
Data Analysis
Consumer of
Data Analysis
Data analysis product
Data analysis evaluation
you
Producer of
Data Analysis
Consumer of
Data Analysis
Data analysis product
Data analysis evaluation
you
clinician
Producer of
Data Analysis
Consumer of
Data Analysis
Data analysis product
Data analysis evaluation
you
another statistician
Producer of
Data Analysis
Consumer of
Data Analysis
Data analysis product
Data analysis evaluation
you
general public
Producer of
Data Analysis
Consumer of
Data Analysis
Data analysis product
Data analysis evaluation
producer’s design principles
consumer’s design principles
Producer of
Data Analysis
Consumer of
Data Analysis
Data analysis product
Data analysis evaluation
producer’s design principles
consumer’s design principles
design principles
for data analysis
D’Agostino McGowan, Peng & Hicks (2022)
design principles
for data analysis
data matching
How well does the available data match the data needed to investigate a question?
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
P(Infection | Vaccination)
P(Vaccination | Infection)
design principles
for data analysis
data matching
How well does the available data match the data needed to investigate a question?
exhaustive
Are specific questions addressed using multiple, complementary methods, tooling or workflows?
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
design principles
for data analysis
data matching
How well does the available data match the data needed to investigate a question?
exhaustive
Are specific questions addressed using multiple, complementary methods, tooling or workflows?
skeptical
Are multiple, related explanations considered using the same data?
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
x
D’Agostino McGowan, Peng & Hicks (2022)
x
D’Agostino McGowan, Peng & Hicks (2022)
design principles
for data analysis
data matching
How well does the available data match the data needed to investigate a question?
exhaustive
Are specific questions addressed using multiple, complementary methods, tooling or workflows?
skeptical
Are multiple, related explanations considered using the same data?
second-order
Does the analysis include anything that does not directly address the primary question, but gives important context to the analysis?
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
design principles
for data analysis
data matching
How well does the available data match the data needed to investigate a question?
exhaustive
Are specific questions addressed using multiple, complementary methods, tooling or workflows?
skeptical
Are multiple, related explanations considered using the same data?
second-order
Does the analysis include anything that does not directly address the primary question, but gives important context to the analysis?
clarity
Does the analysis summarize data in a way that is influential in explaining how the underlying data connects to the conclusions?
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
design principles
for data analysis
data matching
How well does the available data match the data needed to investigate a question?
exhaustive
Are specific questions addressed using multiple, complementary methods, tooling or workflows?
skeptical
Are multiple, related explanations considered using the same data?
second-order
Does the analysis include anything that does not directly address the primary question, but gives important context to the analysis?
clarity
Does the analysis summarize data in a way that is influential in explaining how the underlying data connects to the conclusions?
reproducible
Could someone who is not the original producer take the published code and data and compute the same results?
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
Wake Forest University
54 Students
8 Assignments
10 point scoring
D’Agostino McGowan, Peng & Hicks (2022)
Observed between and within person variation of principles across assignment
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
Observed variation between principles, suggesting they measure different underlying characteristics of a data analysis
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)
Johns Hopkins University
15 Students
2 Different Analysts
10 point scoring
D’Agostino McGowan, Peng & Hicks (2022)
The scoring of principles has some ability to distinguish between analyses done by independent analysts
D’Agostino McGowan, Peng & Hicks (2022)
D’Agostino McGowan, Peng & Hicks (2022)