1 of 87

Moving towards best practice: dispelling statistical myths

Lucy D’Agostino McGowan

Wake Forest University

www.lucymcgowan.com

@LucyStats

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

2 of 87

Disclaimer:

In the past two years I have worked as a consultant for Diachii Sanko

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

3 of 87

https://leanpub.com/universities/courses/jhu/udsml

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

4 of 87

lucymcgowan.com/talk

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

5 of 87

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

6 of 87

fact:

myth:

Dichotomizing a continuous variable has no negative impact on inference

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

7 of 87

fact:

dichotomizing a normally distributed variable at the median is equivalent to losing one third of the data

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

8 of 87

fact:

dichotomizing a normally distributed variable at the median is equivalent to losing one third of the data

Best case scenario

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

9 of 87

Sampled 100 observations from 2011-2012 NHANES to look at the relationship between BMI and Systolic Blood Pressure

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

10 of 87

Sampled 100 observations from 2011-2012 NHANES to look at the relationship between BMI and Systolic Blood Pressure

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

11 of 87

Sampled 100 observations from 2011-2012 NHANES to look at the relationship between BMI and Systolic Blood Pressure

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

12 of 87

fact:

dichotomizing a continuous variable unnecessarily introduces

residual confounding

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

13 of 87

fact:

dichotomizing a continuous variable unnecessarily introduces

residual confounding

Bias!

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

14 of 87

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

15 of 87

fact:

dichotomizing a continuous variable is akin to introducing measurement error

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

16 of 87

fact:

solution:

Keep continuous variables continuous! (And maybe consider modeling them flexibly to allow for non-linear relationships)

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

17 of 87

fact:

“Complete cases” analyses can both bias your result and result in throwing away information

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

18 of 87

fact:

solution:

  1. “Fill in” the missing values
  2. Use the complete cases but make them more representative, using the information you have about those dropped

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

19 of 87

fact:

solution:

  • “Fill in” the missing values
  • Use the complete cases but make them more representative, using the information you have about those dropped

Multiple imputation

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

20 of 87

fact:

solution:

  • “Fill in” the missing values
  • Use the complete cases but make them more representative, using the information you have about those dropped

Multiple imputation

Complete likelihood

approach

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

21 of 87

fact:

solution:

  • “Fill in” the missing values
  • Use the complete cases but make them more representative, using the information you have about those dropped

Re-weight your sample to “look like” one that wasn’t missing data

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

22 of 87

fact:

solution:

  • “Fill in” the missing values
  • Use the complete cases but make them more representative, using the information you have about those dropped

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

23 of 87

fact:

Denominators matter!

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

24 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

25 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

Mortality for those

requiring mechanical

ventilation was 88.1%

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

26 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

Mortality for those

requiring mechanical

ventilation was 88.1%

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

27 of 87

p(dying | ventilated)

# died & were ventilated

total were ventilated

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

28 of 87

p(dying | ventilated)

# died & were ventilated

total were ventilated

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

29 of 87

p(dying | ventilated)

# died & were ventilated

total ventilated

discharged alive

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

30 of 87

p(dying | ventilated)

# died & were ventilated

total ventilated

discharged alive

died

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

31 of 87

p(dying | ventilated)

# died & were ventilated

total ventilated

discharged alive

died

remain hospitalized

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

32 of 87

p(dying | ventilated)

# died & were ventilated

total ventilated

discharged alive

died

remain hospitalized

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

33 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

34 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”

282

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

35 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”

282

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

36 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”

282

282 + 38

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

37 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”

282

282 + 38

= 0.881

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

38 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”

282

282 + 38

= 0.881

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

39 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”

282

282 + 38

= 0.881

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

40 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”

282

= 0.245

282 + 38 + 831

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

41 of 87

JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775

“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”

282 + 831

= 0.967

282 + 38 + 831

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

42 of 87

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

43 of 87

fact:

solution:

Don’t remove data! Consider the whole sample, not just those for whom an event is known

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

44 of 87

cen·sor:

A datapoint is right censored if we know the value is above a certain point, we just don’t know how much

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

45 of 87

N Engl J Med 2020; 382:2327-2336 DOI: 10.1056/NEJMoa2007016

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

46 of 87

endpoint:

clinical improvement

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

47 of 87

analysis:

Cox proportional hazards

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

48 of 87

analysis:

Cox proportional hazards

36 improved

17 censored

10 had not improved by day 28

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

49 of 87

analysis:

Cox proportional hazards

36 improved

17 censored

10 had not improved by day 28

7 died

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

50 of 87

cen·sor:

A datapoint is right censored if we know the value is above a certain point, we just don’t know how much

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

51 of 87

Cumulative

Incidence

of improvement

Days

N Engl J Med 2020; 382:2327-2336 DOI: 10.1056/NEJMoa2007016

84% improved in 28 days �(95% CI: 70 - 99)

52 of 87

Cumulative

Incidence

of improvement

Competing Risks

Days

N Engl J Med 2020; 382:2327-2336 DOI: 10.1056/NEJMoa2007016

74% improved in 28 days �(95% CI: 55 - 86)

53 of 87

fact:

solution:

Consider whether the censoring is informative and if so, use a competing risks analysis

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

54 of 87

fact:

a p-value is the probability of observing a result as extreme or more extreme than the one you observed given the null hypothesis is true

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

55 of 87

fact:

p-values are tied to your sample size, for the same effect, as your sample size increases, your p-value will decrease no matter what

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

56 of 87

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

57 of 87

p-value

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

58 of 87

p-value

P-values can indicate how incompatible the data are with a specified statistical model.

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

59 of 87

p-value

P-values can indicate how incompatible the data are with a specified statistical model.

P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

60 of 87

p-value

P-values can indicate how incompatible the data are with a specified statistical model.

P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

61 of 87

p-value

P-values can indicate how incompatible the data are with a specified statistical model.

P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

Proper inference requires full reporting and transparency

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

62 of 87

p-value

P-values can indicate how incompatible the data are with a specified statistical model.

P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

Proper inference requires full reporting and transparency

A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

63 of 87

p-value

P-values can indicate how incompatible the data are with a specified statistical model.

P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.

Proper inference requires full reporting and transparency

A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.

By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

64 of 87

fact:

solution:

Incorporate scientific significance with statistical significance

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

65 of 87

fact:

solution:

Incorporate scientific significance with statistical significance.

Report a confidence interval (or p-value + effect size)

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

66 of 87

fact:

fact:

When trying to answer a causal question, you cannot use the observed data to decide which variables to include

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

67 of 87

exposure

outcome

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

68 of 87

exposure

outcome

Confounders

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

69 of 87

exposure

outcome

Confounders

Randomization

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

70 of 87

exposure

outcome

Confounders

Randomization

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

71 of 87

exposure

outcome

Confounders

Randomization

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

72 of 87

exposure

outcome

Confounders

Why can’t we just check whether each of our potential confounders are correlated with our exposure and outcome?

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

73 of 87

exposure

outcome

Confounders

Why can’t we just check whether each of our potential confounders are correlated with our exposure and outcome?

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

74 of 87

exposure

outcome

Confounders

Why can’t we just check whether each of our potential confounders are correlated with our exposure and outcome?

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

75 of 87

exposure

outcome

Collider

Confounder

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

76 of 87

Sodium intake

Systolic Blood Pressure

Proteinuria

Age

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

77 of 87

Sodium intake

Systolic Blood Pressure

Proteinuria

Age

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

78 of 87

Sodium intake

Systolic Blood Pressure

Proteinuria

Age

Confounder!

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

79 of 87

Sodium intake

Systolic Blood Pressure

Proteinuria

Age

Confounder!

Effect: 1.05

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

80 of 87

Sodium intake

Systolic Blood Pressure

Proteinuria

Age

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

81 of 87

Sodium intake

Systolic Blood Pressure

Proteinuria

Age

Collider!

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

82 of 87

Sodium intake

Systolic Blood Pressure

Proteinuria

Age

Collider!

Effect: -0.9

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

83 of 87

Sodium intake

Systolic Blood Pressure

Proteinuria

Age

term

estimate

p-value

Sodium intake

-0.9

< 0.0001

Age

0.4

<0.0001

Proteinuria

0.4

<0.0001

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

84 of 87

Sodium intake

Systolic Blood Pressure

Proteinuria

Age

Statistical models don’t know the direction of cause and effect, so we can’t use statistics alone to decide what to include in our models

term

estimate

p-value

Sodium intake

-0.9

< 0.0001

Age

0.4

<0.0001

Proteinuria

0.4

<0.0001

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

85 of 87

fact:

solution:

When trying to answer a causal question, pre-specify the variables you will adjust for (using something like a causal diagram to justify)

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

86 of 87

dichotomizing a normally distributed variable at the median is equivalent to losing one third of the data

dichotomizing a continuous variable unnecessarily introduces

residual confounding

dichotomizing a continuous variable is akin to introducing measurement error

dichotomizing a continuous variable is akin to introducing measurement error

“Complete cases” analyses can both bias your result and result in throwing away information

p-values are tied to your sample size, for the same effect, as your sample size increases, your p-value will decrease no matter what

Denominators matter!

p-values are tied to your sample size, for the same effect, as your sample size increases, your p-value will decrease no matter what

When trying to answer a causal question, you cannot use the observed data to decide which variables to include

Stat Facts:

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan

87 of 87

Thank you!

Lucy D’Agostino McGowan

Wake Forest University

mcgowald@wfu.edu

@LucyStats

Presentation template adapted from Slidesgo

Icons by Flaticon

Infographics by Freepik

Images created by Freepik

University of Houston HCA Grand Rounds Lucy D’Agostino McGowan