1 of 64

Practical principles for data analysis design

Lucy D’Agostino McGowan

Wake Forest University

2 of 64

lucymcgowan.com/talk

3 of 64

Lucy D’Agostino McGowan, Roger D. Peng & Stephanie C. Hicks (2022) Design Principles for Data Analysis, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2022.2104290

4 of 64

Lucy D’Agostino McGowan, Roger D. Peng & Stephanie C. Hicks (2022) Design Principles for Data Analysis, Journal of Computational and Graphical Statistics, DOI: 10.1080/10618600.2022.2104290

5 of 64

statistical thinking

6 of 64

statistical thinking

data

7 of 64

statistical thinking

data

method

8 of 64

statistical thinking

data

answer

9 of 64

https://onishlab.colostate.edu/summer-statistics-workshop-2019/which_test_flowchart/

10 of 64

design thinking

11 of 64

design thinking

Model proposed by the Hasso-Plattner Institute of Design at Stanford

12 of 64

design thinking

empathize

Model proposed by the Hasso-Plattner Institute of Design at Stanford

13 of 64

design thinking

empathize

define

Model proposed by the Hasso-Plattner Institute of Design at Stanford

14 of 64

design thinking

empathize

define

ideate

Model proposed by the Hasso-Plattner Institute of Design at Stanford

15 of 64

design thinking

empathize

define

ideate

prototype

Model proposed by the Hasso-Plattner Institute of Design at Stanford

16 of 64

design thinking

empathize

define

ideate

prototype

test

Model proposed by the Hasso-Plattner Institute of Design at Stanford

17 of 64

design thinking

empathize

define

ideate

prototype

test

Model proposed by the Hasso-Plattner Institute of Design at Stanford

18 of 64

Why?

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

19 of 64

Why?

  • Provide a common language

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

20 of 64

Why?

  • Provide a common language
  • Improve pedagogy

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

21 of 64

Why?

  • Provide a common language
  • Improve pedagogy
  • Improve alignment between data analysis producers and consumers

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

22 of 64

Producer of

Data Analysis

Consumer of

Data Analysis

Data analysis product

Data analysis evaluation

23 of 64

Producer of

Data Analysis

Consumer of

Data Analysis

Data analysis product

Data analysis evaluation

you

24 of 64

Producer of

Data Analysis

Consumer of

Data Analysis

Data analysis product

Data analysis evaluation

you

clinician

25 of 64

Producer of

Data Analysis

Consumer of

Data Analysis

Data analysis product

Data analysis evaluation

you

another statistician

26 of 64

Producer of

Data Analysis

Consumer of

Data Analysis

Data analysis product

Data analysis evaluation

you

general public

27 of 64

Producer of

Data Analysis

Consumer of

Data Analysis

Data analysis product

Data analysis evaluation

producer’s design principles

consumer’s design principles

28 of 64

Producer of

Data Analysis

Consumer of

Data Analysis

Data analysis product

Data analysis evaluation

producer’s design principles

consumer’s design principles

29 of 64

design principles

for data analysis

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

30 of 64

design principles

for data analysis

data matching

How well does the available data match the data needed to investigate a question?

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

31 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

32 of 64

33 of 64

34 of 64

P(Infection | Vaccination)

35 of 64

P(Vaccination | Infection)

36 of 64

37 of 64

design principles

for data analysis

data matching

How well does the available data match the data needed to investigate a question?

exhaustive

Are specific questions addressed using multiple, complementary methods, tooling or workflows?

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

38 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

39 of 64

40 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

41 of 64

design principles

for data analysis

data matching

How well does the available data match the data needed to investigate a question?

exhaustive

Are specific questions addressed using multiple, complementary methods, tooling or workflows?

skeptical

Are multiple, related explanations considered using the same data?

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

42 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

43 of 64

44 of 64

x

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

45 of 64

x

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

46 of 64

design principles

for data analysis

data matching

How well does the available data match the data needed to investigate a question?

exhaustive

Are specific questions addressed using multiple, complementary methods, tooling or workflows?

skeptical

Are multiple, related explanations considered using the same data?

second-order

Does the analysis include anything that does not directly address the primary question, but gives important context to the analysis?

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

47 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

48 of 64

49 of 64

design principles

for data analysis

data matching

How well does the available data match the data needed to investigate a question?

exhaustive

Are specific questions addressed using multiple, complementary methods, tooling or workflows?

skeptical

Are multiple, related explanations considered using the same data?

second-order

Does the analysis include anything that does not directly address the primary question, but gives important context to the analysis?

clarity

Does the analysis summarize data in a way that is influential in explaining how the underlying data connects to the conclusions?

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

50 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

51 of 64

52 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

53 of 64

design principles

for data analysis

data matching

How well does the available data match the data needed to investigate a question?

exhaustive

Are specific questions addressed using multiple, complementary methods, tooling or workflows?

skeptical

Are multiple, related explanations considered using the same data?

second-order

Does the analysis include anything that does not directly address the primary question, but gives important context to the analysis?

clarity

Does the analysis summarize data in a way that is influential in explaining how the underlying data connects to the conclusions?

reproducible

Could someone who is not the original producer take the published code and data and compute the same results?

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

54 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

55 of 64

56 of 64

Wake Forest University

54 Students

8 Assignments

10 point scoring

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

57 of 64

Observed between and within person variation of principles across assignment

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

58 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

59 of 64

Observed variation between principles, suggesting they measure different underlying characteristics of a data analysis

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

60 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

61 of 64

Johns Hopkins University

15 Students

2 Different Analysts

10 point scoring

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

62 of 64

The scoring of principles has some ability to distinguish between analyses done by independent analysts

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

63 of 64

D’Agostino McGowan, Peng & Hicks (2022)

DOI: 10.1080/10618600.2022.2104290

64 of 64

Design Thinking: Empirical Evidence for Six Principles of Data Analysis

Lucy D’Agostino McGowan

Wake Forest University

@LucyStats

lucymcgowan.com/talk

Presentation template by Slidesgo

Icons by Flaticon

Infographics by Freepik

Images created by Freepik