1 of 117

The Case for Deterministic Imputation in Prediction Modeling

Lucy D’Agostino McGowan

Wake Forest University

2 of 117

lucymcgowan.com/talk

Lucy D’Agostino McGowan ● St. Jude 2025

3 of 117

what goes in the imputation model?

4 of 117

“Always include the outcome in imputation models!”

👧

Lucy D’Agostino McGowan ● St. Jude 2025

5 of 117

“Always include the outcome in imputation models!”

👧

Lucy D’Agostino McGowan ● St. Jude 2025

6 of 117

“Always include the outcome in imputation models!”

👧

“Sometimes include the outcome in imputation models?”

👩

Lucy D’Agostino McGowan ● St. Jude 2025

7 of 117

“Always include the outcome in imputation models!”

👧

“Sometimes include the outcome in imputation models?”

👩

Lucy D’Agostino McGowan ● St. Jude 2025

8 of 117

“Always include the outcome in imputation models!”

👧

“Sometimes include the outcome in imputation models?”

👩

Lucy D’Agostino McGowan ● St. Jude 2025

9 of 117

“Always include the outcome in imputation models!”

👧

“Sometimes include the outcome in imputation models?”

“It depends!”

👩

👵

Lucy D’Agostino McGowan ● St. Jude 2025

10 of 117

“Always include the outcome in imputation models!”

👧

“Sometimes include the outcome in imputation models?”

“Include the outcome in stochastic imputation models, don’t in deterministic imputation models

👩

👵

Lucy D’Agostino McGowan ● St. Jude 2025

11 of 117

“Always include the outcome in imputation models!”

👧

“Sometimes include the outcome in imputation models?”

“Include the outcome in stochastic imputation models, don’t in deterministic imputation models

👩

👵

D’Agostino McGowan L, Lotspeich SC, Hepler SA.

Statistical Methods in Medical Research (2024)

Lucy D’Agostino McGowan ● St. Jude 2025

12 of 117

the setup

Lucy D’Agostino McGowan ● St. Jude 2025

13 of 117

the setup

  • Analyzing the relationship between X and Y

Lucy D’Agostino McGowan ● St. Jude 2025

14 of 117

the setup

  • Analyzing the relationship between X and Y

Lucy D’Agostino McGowan ● St. Jude 2025

15 of 117

Lucy D’Agostino McGowan ● St. Jude 2025

16 of 117

Lucy D’Agostino McGowan ● St. Jude 2025

17 of 117

the setup

  • Analyzing the relationship between X and Y
  • X has missing values

Lucy D’Agostino McGowan ● St. Jude 2025

18 of 117

Lucy D’Agostino McGowan ● St. Jude 2025

19 of 117

Lucy D’Agostino McGowan ● St. Jude 2025

20 of 117

the setup

  • Analyzing the relationship between X and Y
  • X has missing values

Lucy D’Agostino McGowan ● St. Jude 2025

21 of 117

Lucy D’Agostino McGowan ● St. Jude 2025

22 of 117

Lucy D’Agostino McGowan ● St. Jude 2025

23 of 117

Lucy D’Agostino McGowan ● St. Jude 2025

24 of 117

Lucy D’Agostino McGowan ● St. Jude 2025

25 of 117

Stochastic imputation

Deterministic imputation

Stochastic imputation

  • Fit a single imputation model
  • Impute missing values using predictions from the imputation model
  • Fit the analysis model using this imputed variable
  • Fit a imputation model
  • Draw the imputed value from the distribution of the predicted values from the imputation model
  • Fit the analysis model using this imputed variable

Lucy D’Agostino McGowan ● St. Jude 2025

26 of 117

Stochastic imputation

Deterministic imputation

Stochastic imputation

  • Fit a single imputation model
  • Impute missing values using predictions from the imputation model
  • Fit the analysis model using this imputed variable
  • Fit a imputation model
  • Draw the imputed value from the distribution of the predicted values from the imputation model
  • Fit the analysis model using this imputed variable

Lucy D’Agostino McGowan ● St. Jude 2025

27 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

28 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

29 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

30 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

31 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

32 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

33 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

34 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

35 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

36 of 117

Deterministic imputation

Stochastic imputation

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

imp_det <- coefs[1] + coefs[2] * z

## equivalently:

imp_det <- predict(lm(x_obs ~ z))

imp_fit <- lm(x_obs ~ z)

coefs <- coef(imp_fit)

sigma <- sqrt(sum(imp_fit$residuals^2) /

rchisq(1, imp_fit$df))

coefs <- coefs +

t(chol(sym(vcov(imp_fit)))) %*%

rnorm(2, 0, sigma)

imp_stoch <- coefs[1] + coefs[2] * z +

rnorm(n, 0, sigma)

Lucy D’Agostino McGowan ● St. Jude 2025

37 of 117

Lucy D’Agostino McGowan ● St. Jude 2025

38 of 117

Fun math fact:

Lucy D’Agostino McGowan ● St. Jude 2025

39 of 117

Fun math fact:

Lucy D’Agostino McGowan ● St. Jude 2025

40 of 117

Deterministic Imputation

Lucy D’Agostino McGowan ● St. Jude 2025

41 of 117

Deterministic Imputation

Lucy D’Agostino McGowan ● St. Jude 2025

42 of 117

Deterministic Imputation

Lucy D’Agostino McGowan ● St. Jude 2025

43 of 117

Stochastic Imputation

Lucy D’Agostino McGowan ● St. Jude 2025

44 of 117

Fun math fact:

Lucy D’Agostino McGowan ● St. Jude 2025

45 of 117

The covariance between the imputed X and Y depends

on the imputation model that was fit.

Assume this is our imputation model, no outcome

Lucy D’Agostino McGowan ● St. Jude 2025

46 of 117

The covariance between the imputed X and Y depends

on the imputation model that was fit.

Lucy D’Agostino McGowan ● St. Jude 2025

47 of 117

The covariance between the imputed X and Y depends

on the imputation model that was fit.

X

X

Lucy D’Agostino McGowan ● St. Jude 2025

48 of 117

Deterministic and Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

49 of 117

Deterministic and Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

50 of 117

Deterministic and Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

51 of 117

Deterministic and Stochastic imputation

without Y in the imputation model, this is too small

Lucy D’Agostino McGowan ● St. Jude 2025

52 of 117

Deterministic and Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

53 of 117

Deterministic and Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

54 of 117

Deterministic and Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

55 of 117

Deterministic Imputation

same scaling factor!!

Lucy D’Agostino McGowan ● St. Jude 2025

56 of 117

Fun math fact:

Lucy D’Agostino McGowan ● St. Jude 2025

57 of 117

Deterministic imputation

Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

58 of 117

Deterministic imputation

Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

59 of 117

Deterministic imputation

Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

60 of 117

Deterministic imputation

Stochastic imputation

😱

Lucy D’Agostino McGowan ● St. Jude 2025

61 of 117

Deterministic imputation

Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

62 of 117

The covariance between the imputed X and Y depends

on the imputation model that was fit.

Assume this is our imputation model, no outcome

Lucy D’Agostino McGowan ● St. Jude 2025

63 of 117

Deterministic imputation

Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

64 of 117

Take away:

Do not include the outcome in your imputation model when doing deterministic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

65 of 117

The covariance between the imputed X and Y depends

on the imputation model that was fit.

Assume this is our imputation model, with the outcome

Lucy D’Agostino McGowan ● St. Jude 2025

66 of 117

The covariance between the imputed X and Y depends

on the imputation model that was fit.

Assume this is our imputation model, with the outcome

Lucy D’Agostino McGowan ● St. Jude 2025

67 of 117

Deterministic and Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

68 of 117

Deterministic imputation

Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

69 of 117

Deterministic imputation

Stochastic imputation

😱

Lucy D’Agostino McGowan ● St. Jude 2025

70 of 117

Deterministic imputation

Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

71 of 117

Deterministic imputation

Stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

72 of 117

Take away:

Do include the outcome in your imputation model when doing stochastic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

73 of 117

Deterministic imputation, Y is not in the imputation model

74 of 117

Deterministic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

Variance of the imputed X

75 of 117

Deterministic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

Covariance of the imputed X

76 of 117

Deterministic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

Estimate is unbiased

77 of 117

Deterministic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

Estimate is too small

78 of 117

Deterministic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

Estimate is too big

79 of 117

Deterministic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

80 of 117

Deterministic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

81 of 117

Deterministic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

82 of 117

Deterministic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

83 of 117

Deterministic imputation, Y is in the imputation model

Lucy D’Agostino McGowan

84 of 117

Deterministic imputation, Y is in the imputation model

Lucy D’Agostino McGowan

85 of 117

Deterministic imputation, Y is in the imputation model

Always too big

Lucy D’Agostino McGowan

86 of 117

Stochastic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

87 of 117

Stochastic imputation, Y is not in the imputation model

Lucy D’Agostino McGowan

88 of 117

Stochastic imputation, Y is not in the imputation model

Always too small

Lucy D’Agostino McGowan

89 of 117

Stochastic imputation, Y is in the imputation model

Lucy D’Agostino McGowan

90 of 117

Take away:

  • Do include the outcome in your imputation model when doing stochastic imputation
  • Do not include the outcome in your imputation model when doing deterministic imputation

Lucy D’Agostino McGowan ● St. Jude 2025

91 of 117

What about prediction?

Lucy D’Agostino McGowan ● St. Jude 2025

92 of 117

What about prediction?

Lucy D’Agostino McGowan ● St. Jude 2025

93 of 117

Three Phases

Training

Testing

Deployment

Lucy D’Agostino McGowan ● St. Jude 2025

94 of 117

Three Phases

Training

Testing

Deployment

✓ have Y

✓ use Y for stochastic

✖ use Y for deterministic

✓ have Y

✓ use Y for stochastic

✖ use Y for deterministic

✖ have Y

✖ use Y for stochastic

✖ use Y for deterministic

Lucy D’Agostino McGowan ● St. Jude 2025

95 of 117

Three Phases

Training

Testing

Deployment

✓ have Y

✓ use Y for stochastic

✖ use Y for deterministic

✓ have Y

✓ use Y for stochastic

✖ use Y for deterministic

✖ have Y

✖ use Y for stochastic

✖ use Y for deterministic

Lucy D’Agostino McGowan ● St. Jude 2025

96 of 117

Three Phases

Training

Testing

Deployment

✓ have Y

✓ use Y for stochastic

✖ use Y for deterministic

✓ have Y

✓ use Y for stochastic

✖ use Y for deterministic

✖ have Y

✖ use Y for stochastic

✖ use Y for deterministic

Lucy D’Agostino McGowan ● St. Jude 2025

97 of 117

Three Phases

Training

Testing

Deployment

✓ have Y

✓ use Y for stochastic

✖ use Y for deterministic

✓ have Y

✖ use Y for stochastic

✖ use Y for deterministic

✖ have Y

✖ use Y for stochastic

✖ use Y for deterministic

Lucy D’Agostino McGowan ● St. Jude 2025

98 of 117

Metrics

Lucy D’Agostino McGowan ● St. Jude 2025

99 of 117

Metrics

Lucy D’Agostino McGowan ● St. Jude 2025

100 of 117

Metrics

Lucy D’Agostino McGowan ● St. Jude 2025

101 of 117

RMSE

Lucy D’Agostino McGowan ● St. Jude 2025

102 of 117

RMSE

Lucy D’Agostino McGowan ● St. Jude 2025

103 of 117

Mean Absolute Error

Lucy D’Agostino McGowan ● St. Jude 2025

104 of 117

Mean Absolute Error

Lucy D’Agostino McGowan ● St. Jude 2025

105 of 117

R Squared

Lucy D’Agostino McGowan ● St. Jude 2025

106 of 117

R Squared

Lucy D’Agostino McGowan ● St. Jude 2025

107 of 117

Take away:

  • The test environment must match the deployment environment for prediction
  • Do not include the outcome in your imputation model when doing deterministic imputation
  • Do include the outcome in your imputation model when doing stochastic imputation
  • Consider deterministic imputation for prediction models, it may outperform stochastic

Lucy D’Agostino McGowan ● St. Jude 2025

108 of 117

why should

we care?

109 of 117

110 of 117

Hippisley-Cox et al. BMJ (2007)

111 of 117

Hippisley-Cox et al. BMJ (2007)

112 of 117

113 of 117

114 of 117

115 of 117

116 of 117

Take away:

  • The test environment must match the deployment environment for prediction
  • Do not include the outcome in your imputation model when doing deterministic imputation
  • Do include the outcome in your imputation model when doing stochastic imputation
  • Consider deterministic imputation for prediction models, it may outperform stochastic

Lucy D’Agostino McGowan ● St. Jude 2025

117 of 117

The Case for Deterministic Imputation in Prediction Modeling

Lucy D’Agostino McGowan

Wake Forest University

@LucyStats