1 of 46

Nested ensemble machine learning to predict heart attack risk in patients with chest pain

Chris Kennedy

Kaiser Permanente Division of Research

PhD candidate in biostatistics at UC Berkeley

2 of 46

Context for why I'm here

3 of 46

My background / plans

  • PhD candidate in Biostatistics
  • Expecting to file dissertation in July
  • Moving to Boston around July, depending upon pandemic status
  • Looking to find a postdoc with a biomedical deep learning focus

4 of 46

Dissertation chapters

  • Nested ensemble machine learning to predict heart attack risk in patients with chest pain
  • Multitask, ordinal deep learning and faceted Rasch modeling for debiased, explainable, interval hate speech measurement (hatespeech.berkeley.edu)
    • Possible biomedical extensions: analysis of imaging for cancer severity grading
    • Funding targets: NSF, industry, foundations
  • Discovering toxic exposure mixtures and ranking variable set importance via cross-validated causal inference and machine learning
    • Related to research by Chirag Patel, Francesca Dominici
    • Funding targets: NIH, EPA

5 of 46

Methods interests

  • Deep learning for text, imaging, time series, waveform (e.g. MIMIC), audio
    • Supervised, unsupervised, reinforcement
  • Machine learning
    • Hyperparameter optimization, ensemble meta-learning, feature selection, clustering, GLRM
  • Causal inference (targeted learning)
    • Exposure mixtures, causal variable importance, optimal treatment regimes, nonparametric dose-response
  • Randomized trials
    • Adaptive design, machine learning for covariate adjustment, rerandomization
  • Item response theory / psychometrics

6 of 46

Returning to the chest pain project

7 of 46

Collaborators

  • Mary Reed, DrPh
  • Dustin Mark, MD (CREST - https://www.kpcrest.net/)
  • Jie Huang, PhD

8 of 46

Caveat: preliminary results

Please do not cite or share results.

Preprint in the works and planned for release by July.

9 of 46

Scientific question (exploratory)

Among patients who present to KP with chest pain, what is their probability of having a heart attack or other major adverse cardiac event (MACE) in the next 2 months?

This risk estimate could potentially support improved resource allocation/workup and patient outcomes/discharge, based on low-risk cut-off of < 0.5% (or 1% or 2%).

  • Legal risk is thought to contribute to overtreatment of patients who present with chest pain in the emergency department

10 of 46

Topics to cover

  • Background on chest pain
  • Missing data
  • Imputation with generalized low-rank models (GLRM)
  • Machine learning
  • Variable importance
  • Accumulated local effect plots
  • Exploratory data analysis

11 of 46

Data structure

Observations: 116,764

Outcome: major adverse cardiac event (MACE) in 60 days (1.88% positive)

Covariates (65):

  • Troponin - muscle protein that indicates heart injury
  • Benchmark risk scores: EDACS, HEART
  • Treadmill stress test results, history of coronary artery disease or heart attack
  • Age, gender, race, BMI, blood pressure, smoking status, lipid profile, kidney GFR
  • Text search of clinical notes: ECG interpretation, pain features (radiating, stabbing, palpation, inspiration), diaphoresis, exertional symptoms
  • Missingness indicators

12 of 46

Missing data

ggplot, kableExtra

13 of 46

Correlation structure of missingness

superheat::superheat()

14 of 46

Generalized low rank models for imputation

  • GLRMs extend PCA to arbitrary loss functions, regularization/sparsity, and non-numeric data types (binary, ordinal, categorical)
  • Unification of non-negative matrix factorization, matrix completion, sparse and robust PCA, k-means, k-SVD, and maximum margin matrix factorization
  • Project original dataset to (e.g.) 10-dimension low-rank approximation, then transform back to the original scale to impute missing values in a matrix completion style.
  • Udell, M., Horn, C., Zadeh, R., & Boyd, S. (2016). Generalized low rank models. Foundations and Trends® in Machine Learning, 9(1), 1-118.
  • Implementations currently available in h2o.ai and Julia
  • Importance of hyperparameter tuning

15 of 46

Hyperparameter tuning for GLRM: 5 dimensions

  • Number of archetypes (components)
    • Range from 1 to number of covariates
  • Amount of regularization on X (compressed dataframe)
    • Range from 0 to infinite
  • Type of regularization on X
    • L1 (sparse) or L2 (denoised)
  • Amount of regularization on Y (feature weightings to define archetypes)
    • Again 0 to infinite
  • Type of regularization on Y
    • L1 (sparse) or L2 (denoised)

These can be optimized using a training/test split (e.g. 80/20) or cross-validation

16 of 46

First round of hyperparameter tuning: 300,000

Default hyperparameters with 19 components: 787,000

17 of 46

3rd round of hyperparameter tuning: 2,000

18 of 46

Optimal settings for missing value imputation

  • k = 50 (versus 57 original covariates)
  • Regularization amount for X: 4
  • Regularization type for X: L2 (dense denoising)
  • Regularization amount for Y: 24
  • Regularization type for Y: L1 (sparse denoising)

19 of 46

Imputation error comparison: GLRM vs. Median/Mode

20 of 46

GLRM: examining cumulative variance explained

21 of 46

Machine Learning

22 of 46

Decision tree baseline (6 leaf nodes)

Given class imbalance, it was critical to re-weight the loss matrix for inverse class proportion.

Alternatively one could use observation weights, but then the plots can't accurately show the proportion of dataset in each leaf node.

23 of 46

Performance metrics

  • Primary: area under the precision-recall curve (PR-AUC)
    • Precision: correctly predicted positive / predicted positive
      • TP / (TP + FP)
      • "positive predictive value"
    • Recall: correctly predicted positive / all positive observations
      • TP / (TP + FN)
      • "sensitivity" or "true positive rate"
    • Note that it does not include a true negative term (TN)
    • Calculate precision vs. recall at all possible thresholds to get PR-AUC
      • "average precision"
    • Baseline PR-AUC is the outcome average (percentage of positive cases)
  • Secondary: area under the ROC curve (ROC-AUC)
    • True positive rate: recall TP / (TP + FN)
    • False positive rate: 1 - specificity
      • FN / (TN + FN)
    • Baseline AUC is 0.5
    • With imbalanced data AUC will be high due to negative cases being classified correctly
    • Intuition: % of time that random positive case ranked higher than random negative case

precrec, cvAUC

24 of 46

Discrimination: PR-AUC

1.88%

25 of 46

Discrimination: ROC-AUC

26 of 46

ROC Curve

27 of 46

SuperLearner weight distribution

28 of 46

Random Forest plateau analysis

29 of 46

Calibration

30 of 46

Calibration: overall

31 of 46

Calibration: zoomed in and exponential scale

32 of 46

Variable importance

  • How influential or impactful is each variable on the predictive accuracy of our models?
  • For a given model, which variables are driving the predictions?
  • Methods
    • Correlation (shown in EDA)
    • Single decision tree
    • OLS
    • Random forest
    • XGBoost
    • To add: BART, vimp, permutation importance

33 of 46

Vimp: Decision Tree

vip, rpart

34 of 46

Vimp: OLS

vip

35 of 46

Vimp: Random Forest

vip, ranger

36 of 46

Vimp: XGBoost

vip, xgboost

37 of 46

Accumulated local effect plots

  • Show how covariate influences prediction
  • Examines change in prediction for a certain variable when it is adjusted over a small range
  • Ex: for observations with age between 50 and 60, what is the average prediction when we set all those patients to age 60 compared to age 50?
  • Improves on partial dependence plots, which make unrealistic assumption that features are not correlated
  • See Chapter 5 in Interpretable Machine Learning

38 of 46

ALE: troponin peak, age, EDACS

iml

39 of 46

ALE: troponin 3-hour, HEART, pulse

iml

40 of 46

Exploratory data analysis

  • Analyze key continuous variables
  • Density of variable separately for 0's vs. 1's
  • Smoothed risk across range of variable
    • Compare to baseline risk
    • Plot linear correlation

41 of 46

EDA: EDACS

42 of 46

EDACS: inefficient resolution of score

43 of 46

EDA: age

44 of 46

EDA: BMI

45 of 46

Correlation (Pearson)

ck37r::

vim_corr()

46 of 46

Thanks - any questions, comments, or feedback?

Twitter: @c3k

Website: ck37.com

GitHub: github.com/ck37