The Case for Deterministic Imputation in Prediction Modeling
Lucy D’Agostino McGowan
Wake Forest University
lucymcgowan.com/talk
Lucy D’Agostino McGowan ● St. Jude 2025
what goes in the imputation model?
“Always include the outcome in imputation models!”
👧
Lucy D’Agostino McGowan ● St. Jude 2025
“Always include the outcome in imputation models!”
👧
Lucy D’Agostino McGowan ● St. Jude 2025
“Always include the outcome in imputation models!”
👧
“Sometimes include the outcome in imputation models?”
👩
Lucy D’Agostino McGowan ● St. Jude 2025
“Always include the outcome in imputation models!”
👧
“Sometimes include the outcome in imputation models?”
👩
Lucy D’Agostino McGowan ● St. Jude 2025
“Always include the outcome in imputation models!”
👧
“Sometimes include the outcome in imputation models?”
👩
Lucy D’Agostino McGowan ● St. Jude 2025
“Always include the outcome in imputation models!”
👧
“Sometimes include the outcome in imputation models?”
“It depends!”
👩
👵
Lucy D’Agostino McGowan ● St. Jude 2025
“Always include the outcome in imputation models!”
👧
“Sometimes include the outcome in imputation models?”
“Include the outcome in stochastic imputation models, don’t in deterministic imputation models”
👩
👵
Lucy D’Agostino McGowan ● St. Jude 2025
“Always include the outcome in imputation models!”
👧
“Sometimes include the outcome in imputation models?”
“Include the outcome in stochastic imputation models, don’t in deterministic imputation models”
👩
👵
D’Agostino McGowan L, Lotspeich SC, Hepler SA.
Statistical Methods in Medical Research (2024)
Lucy D’Agostino McGowan ● St. Jude 2025
the setup
Lucy D’Agostino McGowan ● St. Jude 2025
the setup
Lucy D’Agostino McGowan ● St. Jude 2025
the setup
Lucy D’Agostino McGowan ● St. Jude 2025
Lucy D’Agostino McGowan ● St. Jude 2025
Lucy D’Agostino McGowan ● St. Jude 2025
the setup
Lucy D’Agostino McGowan ● St. Jude 2025
Lucy D’Agostino McGowan ● St. Jude 2025
Lucy D’Agostino McGowan ● St. Jude 2025
the setup
Lucy D’Agostino McGowan ● St. Jude 2025
Lucy D’Agostino McGowan ● St. Jude 2025
Lucy D’Agostino McGowan ● St. Jude 2025
Lucy D’Agostino McGowan ● St. Jude 2025
Lucy D’Agostino McGowan ● St. Jude 2025
Stochastic imputation
Deterministic imputation
Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Stochastic imputation
Deterministic imputation
Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
imp_det <- coefs[1] + coefs[2] * z
## equivalently:
imp_det <- predict(lm(x_obs ~ z))
imp_fit <- lm(x_obs ~ z)
coefs <- coef(imp_fit)
sigma <- sqrt(sum(imp_fit$residuals^2) /
rchisq(1, imp_fit$df))
coefs <- coefs +
t(chol(sym(vcov(imp_fit)))) %*%
rnorm(2, 0, sigma)
imp_stoch <- coefs[1] + coefs[2] * z +
rnorm(n, 0, sigma)
Lucy D’Agostino McGowan ● St. Jude 2025
Lucy D’Agostino McGowan ● St. Jude 2025
Fun math fact:
Lucy D’Agostino McGowan ● St. Jude 2025
Fun math fact:
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic Imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic Imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic Imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Stochastic Imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Fun math fact:
Lucy D’Agostino McGowan ● St. Jude 2025
The covariance between the imputed X and Y depends
on the imputation model that was fit.
Assume this is our imputation model, no outcome
Lucy D’Agostino McGowan ● St. Jude 2025
The covariance between the imputed X and Y depends
on the imputation model that was fit.
Lucy D’Agostino McGowan ● St. Jude 2025
The covariance between the imputed X and Y depends
on the imputation model that was fit.
X
X
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic and Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic and Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic and Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic and Stochastic imputation
without Y in the imputation model, this is too small
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic and Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic and Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic and Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic Imputation
same scaling factor!!
Lucy D’Agostino McGowan ● St. Jude 2025
Fun math fact:
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
✅
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
✅
😱
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
✅
❌
Lucy D’Agostino McGowan ● St. Jude 2025
The covariance between the imputed X and Y depends
on the imputation model that was fit.
Assume this is our imputation model, no outcome
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
✅
❌
Lucy D’Agostino McGowan ● St. Jude 2025
Take away:
Do not include the outcome in your imputation model when doing deterministic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
The covariance between the imputed X and Y depends
on the imputation model that was fit.
Assume this is our imputation model, with the outcome
Lucy D’Agostino McGowan ● St. Jude 2025
The covariance between the imputed X and Y depends
on the imputation model that was fit.
Assume this is our imputation model, with the outcome
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic and Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
😱
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
❌
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation
Stochastic imputation
❌
✅
Lucy D’Agostino McGowan ● St. Jude 2025
Take away:
Do include the outcome in your imputation model when doing stochastic imputation
Lucy D’Agostino McGowan ● St. Jude 2025
Deterministic imputation, Y is not in the imputation model
Deterministic imputation, Y is not in the imputation model
Lucy D’Agostino McGowan
Variance of the imputed X
Deterministic imputation, Y is not in the imputation model
Lucy D’Agostino McGowan
Covariance of the imputed X
Deterministic imputation, Y is not in the imputation model
Lucy D’Agostino McGowan
Estimate is unbiased
Deterministic imputation, Y is not in the imputation model
Lucy D’Agostino McGowan
Estimate is too small
Deterministic imputation, Y is not in the imputation model
Lucy D’Agostino McGowan
Estimate is too big
Deterministic imputation, Y is not in the imputation model
Lucy D’Agostino McGowan
Deterministic imputation, Y is not in the imputation model
Lucy D’Agostino McGowan
Deterministic imputation, Y is not in the imputation model
Lucy D’Agostino McGowan
Deterministic imputation, Y is not in the imputation model
✅
Lucy D’Agostino McGowan
Deterministic imputation, Y is in the imputation model
Lucy D’Agostino McGowan
Deterministic imputation, Y is in the imputation model
❌
Lucy D’Agostino McGowan
Deterministic imputation, Y is in the imputation model
❌
Always too big
Lucy D’Agostino McGowan
Stochastic imputation, Y is not in the imputation model
Lucy D’Agostino McGowan
Stochastic imputation, Y is not in the imputation model
❌
Lucy D’Agostino McGowan
Stochastic imputation, Y is not in the imputation model
❌
Always too small
Lucy D’Agostino McGowan
Stochastic imputation, Y is in the imputation model
Lucy D’Agostino McGowan
Take away:
Lucy D’Agostino McGowan ● St. Jude 2025
What about prediction?
Lucy D’Agostino McGowan ● St. Jude 2025
What about prediction?
Lucy D’Agostino McGowan ● St. Jude 2025
Three Phases
Training
Testing
Deployment
Lucy D’Agostino McGowan ● St. Jude 2025
Three Phases
Training
Testing
Deployment
✓ have Y
✓ use Y for stochastic
✖ use Y for deterministic
✓ have Y
✓ use Y for stochastic
✖ use Y for deterministic
✖ have Y
✖ use Y for stochastic
✖ use Y for deterministic
Lucy D’Agostino McGowan ● St. Jude 2025
Three Phases
Training
Testing
Deployment
✓ have Y
✓ use Y for stochastic
✖ use Y for deterministic
✓ have Y
✓ use Y for stochastic
✖ use Y for deterministic
✖ have Y
✖ use Y for stochastic
✖ use Y for deterministic
Lucy D’Agostino McGowan ● St. Jude 2025
Three Phases
Training
Testing
Deployment
✓ have Y
✓ use Y for stochastic
✖ use Y for deterministic
✓ have Y
✓ use Y for stochastic
✖ use Y for deterministic
✖ have Y
✖ use Y for stochastic
✖ use Y for deterministic
Lucy D’Agostino McGowan ● St. Jude 2025
Three Phases
Training
Testing
Deployment
✓ have Y
✓ use Y for stochastic
✖ use Y for deterministic
✓ have Y
✖ use Y for stochastic
✖ use Y for deterministic
✖ have Y
✖ use Y for stochastic
✖ use Y for deterministic
Lucy D’Agostino McGowan ● St. Jude 2025
Metrics
Lucy D’Agostino McGowan ● St. Jude 2025
Metrics
Lucy D’Agostino McGowan ● St. Jude 2025
Metrics
Lucy D’Agostino McGowan ● St. Jude 2025
RMSE
Lucy D’Agostino McGowan ● St. Jude 2025
RMSE
Lucy D’Agostino McGowan ● St. Jude 2025
Mean Absolute Error
Lucy D’Agostino McGowan ● St. Jude 2025
Mean Absolute Error
Lucy D’Agostino McGowan ● St. Jude 2025
R Squared
Lucy D’Agostino McGowan ● St. Jude 2025
R Squared
Lucy D’Agostino McGowan ● St. Jude 2025
Take away:
Lucy D’Agostino McGowan ● St. Jude 2025
why should
we care?
Hippisley-Cox et al. BMJ (2007)
Hippisley-Cox et al. BMJ (2007)
Take away:
Lucy D’Agostino McGowan ● St. Jude 2025
The Case for Deterministic Imputation in Prediction Modeling
Lucy D’Agostino McGowan
Wake Forest University
@LucyStats