Moving towards best practice: dispelling statistical myths
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Disclaimer:
In the past two years I have worked as a consultant for Diachii Sanko
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
https://leanpub.com/universities/courses/jhu/udsml
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
lucymcgowan.com/talk
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
myth:
Dichotomizing a continuous variable has no negative impact on inference
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
dichotomizing a normally distributed variable at the median is equivalent to losing one third of the data
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
dichotomizing a normally distributed variable at the median is equivalent to losing one third of the data
Best case scenario
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sampled 100 observations from 2011-2012 NHANES to look at the relationship between BMI and Systolic Blood Pressure
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sampled 100 observations from 2011-2012 NHANES to look at the relationship between BMI and Systolic Blood Pressure
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sampled 100 observations from 2011-2012 NHANES to look at the relationship between BMI and Systolic Blood Pressure
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
dichotomizing a continuous variable unnecessarily introduces
residual confounding
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
dichotomizing a continuous variable unnecessarily introduces
residual confounding
Bias!
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
dichotomizing a continuous variable is akin to introducing measurement error
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
Keep continuous variables continuous! (And maybe consider modeling them flexibly to allow for non-linear relationships)
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
“Complete cases” analyses can both bias your result and result in throwing away information
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
Multiple imputation
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
Multiple imputation
Complete likelihood
approach
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
Re-weight your sample to “look like” one that wasn’t missing data
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
Denominators matter!
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“Mortality for those
requiring mechanical
ventilation was 88.1%”
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“Mortality for those
requiring mechanical
ventilation was 88.1%”
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p(dying | ventilated)
# died & were ventilated
total were ventilated
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p(dying | ventilated)
# died & were ventilated
total were ventilated
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p(dying | ventilated)
# died & were ventilated
total ventilated
discharged alive
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p(dying | ventilated)
# died & were ventilated
total ventilated
discharged alive
died
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p(dying | ventilated)
# died & were ventilated
total ventilated
discharged alive
died
remain hospitalized
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p(dying | ventilated)
# died & were ventilated
total ventilated
discharged alive
died
remain hospitalized
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”
282
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”
282
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”
282
282 + 38
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”
282
282 + 38
= 0.881
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”
282
282 + 38
= 0.881
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”
282
282 + 38
= 0.881
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”
282
= 0.245
282 + 38 + 831
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
JAMA. 2020;323(20):2052-2059. doi:10.1001/jama.2020.6775
“As of April 4, 2020, for patients requiring mechanical ventilation 38 (3.3%) were discharged alive, 282 (24.5%) died, and 831 (72.2%) remained in hospital”
282 + 831
= 0.967
282 + 38 + 831
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
Don’t remove data! Consider the whole sample, not just those for whom an event is known
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
cen·sor:
A datapoint is right censored if we know the value is above a certain point, we just don’t know how much
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
N Engl J Med 2020; 382:2327-2336 DOI: 10.1056/NEJMoa2007016
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
endpoint:
clinical improvement
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
analysis:
Cox proportional hazards
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
analysis:
Cox proportional hazards
36 improved
17 censored
10 had not improved by day 28
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
analysis:
Cox proportional hazards
36 improved
17 censored
10 had not improved by day 28
7 died
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
cen·sor:
A datapoint is right censored if we know the value is above a certain point, we just don’t know how much
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Cumulative
Incidence
of improvement
Days
N Engl J Med 2020; 382:2327-2336 DOI: 10.1056/NEJMoa2007016
84% improved in 28 days �(95% CI: 70 - 99)
Cumulative
Incidence
of improvement
Competing Risks
Days
N Engl J Med 2020; 382:2327-2336 DOI: 10.1056/NEJMoa2007016
74% improved in 28 days �(95% CI: 55 - 86)
fact:
solution:
Consider whether the censoring is informative and if so, use a competing risks analysis
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
a p-value is the probability of observing a result as extreme or more extreme than the one you observed given the null hypothesis is true
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
p-values are tied to your sample size, for the same effect, as your sample size increases, your p-value will decrease no matter what
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p-value
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p-value
P-values can indicate how incompatible the data are with a specified statistical model.
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p-value
P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p-value
P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p-value
P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
Proper inference requires full reporting and transparency
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p-value
P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
Proper inference requires full reporting and transparency
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
p-value
P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
Proper inference requires full reporting and transparency
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
Incorporate scientific significance with statistical significance
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
Incorporate scientific significance with statistical significance.
Report a confidence interval (or p-value + effect size)
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
fact:
When trying to answer a causal question, you cannot use the observed data to decide which variables to include
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
exposure
outcome
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
exposure
outcome
Confounders
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
exposure
outcome
Confounders
Randomization
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
exposure
outcome
Confounders
Randomization
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
exposure
outcome
Confounders
Randomization
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
exposure
outcome
Confounders
Why can’t we just check whether each of our potential confounders are correlated with our exposure and outcome?
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
exposure
outcome
Confounders
Why can’t we just check whether each of our potential confounders are correlated with our exposure and outcome?
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
exposure
outcome
Confounders
Why can’t we just check whether each of our potential confounders are correlated with our exposure and outcome?
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
exposure
outcome
Collider
Confounder
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sodium intake
Systolic Blood Pressure
Proteinuria
Age
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sodium intake
Systolic Blood Pressure
Proteinuria
Age
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sodium intake
Systolic Blood Pressure
Proteinuria
Age
Confounder!
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sodium intake
Systolic Blood Pressure
Proteinuria
Age
Confounder!
Effect: 1.05
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sodium intake
Systolic Blood Pressure
Proteinuria
Age
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sodium intake
Systolic Blood Pressure
Proteinuria
Age
Collider!
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sodium intake
Systolic Blood Pressure
Proteinuria
Age
Collider!
Effect: -0.9
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sodium intake
Systolic Blood Pressure
Proteinuria
Age
term | estimate | p-value |
Sodium intake | -0.9 | < 0.0001 |
Age | 0.4 | <0.0001 |
Proteinuria | 0.4 | <0.0001 |
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
Sodium intake
Systolic Blood Pressure
Proteinuria
Age
Statistical models don’t know the direction of cause and effect, so we can’t use statistics alone to decide what to include in our models
term | estimate | p-value |
Sodium intake | -0.9 | < 0.0001 |
Age | 0.4 | <0.0001 |
Proteinuria | 0.4 | <0.0001 |
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
fact:
solution:
When trying to answer a causal question, pre-specify the variables you will adjust for (using something like a causal diagram to justify)
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan
dichotomizing a normally distributed variable at the median is equivalent to losing one third of the data
dichotomizing a continuous variable unnecessarily introduces
residual confounding
dichotomizing a continuous variable is akin to introducing measurement error
dichotomizing a continuous variable is akin to introducing measurement error
“Complete cases” analyses can both bias your result and result in throwing away information
p-values are tied to your sample size, for the same effect, as your sample size increases, your p-value will decrease no matter what
Denominators matter!
p-values are tied to your sample size, for the same effect, as your sample size increases, your p-value will decrease no matter what
When trying to answer a causal question, you cannot use the observed data to decide which variables to include
Stat Facts:
University of Houston HCA Grand Rounds • Lucy D’Agostino McGowan