1 of 46

Nested ensemble machine learning to predict heart attack risk in patients with chest pain

Chris Kennedy

Kaiser Permanente Division of Research

PhD candidate in biostatistics at UC Berkeley

2 of 46

Context for why I'm here

3 of 46

My background / plans

PhD candidate in Biostatistics
Expecting to file dissertation in July
Moving to Boston around July, depending upon pandemic status
Looking to find a postdoc with a biomedical deep learning focus

4 of 46

Dissertation chapters

Nested ensemble machine learning to predict heart attack risk in patients with chest pain
Multitask, ordinal deep learning and faceted Rasch modeling for debiased, explainable, interval hate speech measurement (hatespeech.berkeley.edu)

Possible biomedical extensions: analysis of imaging for cancer severity grading
Funding targets: NSF, industry, foundations

Discovering toxic exposure mixtures and ranking variable set importance via cross-validated causal inference and machine learning

Related to research by Chirag Patel, Francesca Dominici
Funding targets: NIH, EPA

5 of 46

Methods interests

Deep learning for text, imaging, time series, waveform (e.g. MIMIC), audio

Supervised, unsupervised, reinforcement

Machine learning

Hyperparameter optimization, ensemble meta-learning, feature selection, clustering, GLRM

Causal inference (targeted learning)

Exposure mixtures, causal variable importance, optimal treatment regimes, nonparametric dose-response

Randomized trials

Adaptive design, machine learning for covariate adjustment, rerandomization

Item response theory / psychometrics

6 of 46

Returning to the chest pain project

7 of 46

Collaborators

Mary Reed, DrPh
Dustin Mark, MD (CREST - https://www.kpcrest.net/)
Jie Huang, PhD

8 of 46

Caveat: preliminary results

Please do not cite or share results.

Preprint in the works and planned for release by July.

9 of 46

Scientific question (exploratory)

Among patients who present to KP with chest pain, what is their probability of having a heart attack or other major adverse cardiac event (MACE) in the next 2 months?

This risk estimate could potentially support improved resource allocation/workup and patient outcomes/discharge, based on low-risk cut-off of < 0.5% (or 1% or 2%).

Legal risk is thought to contribute to overtreatment of patients who present with chest pain in the emergency department

10 of 46

Topics to cover

Background on chest pain
Missing data
Imputation with generalized low-rank models (GLRM)
Machine learning
Variable importance
Accumulated local effect plots
Exploratory data analysis

11 of 46

Data structure

Observations: 116,764

Outcome: major adverse cardiac event (MACE) in 60 days (1.88% positive)

Covariates (65):

Troponin - muscle protein that indicates heart injury
Benchmark risk scores: EDACS, HEART
Treadmill stress test results, history of coronary artery disease or heart attack
Age, gender, race, BMI, blood pressure, smoking status, lipid profile, kidney GFR
Text search of clinical notes: ECG interpretation, pain features (radiating, stabbing, palpation, inspiration), diaphoresis, exertional symptoms
Missingness indicators

12 of 46

Missing data

ggplot, kableExtra

13 of 46

Correlation structure of missingness

superheat::superheat()

14 of 46

Generalized low rank models for imputation

GLRMs extend PCA to arbitrary loss functions, regularization/sparsity, and non-numeric data types (binary, ordinal, categorical)
Unification of non-negative matrix factorization, matrix completion, sparse and robust PCA, k-means, k-SVD, and maximum margin matrix factorization
Project original dataset to (e.g.) 10-dimension low-rank approximation, then transform back to the original scale to impute missing values in a matrix completion style.
Udell, M., Horn, C., Zadeh, R., & Boyd, S. (2016). Generalized low rank models. Foundations and Trends® in Machine Learning, 9(1), 1-118.
Implementations currently available in h2o.ai and Julia
Importance of hyperparameter tuning

15 of 46

Hyperparameter tuning for GLRM: 5 dimensions

Number of archetypes (components)

Range from 1 to number of covariates

Amount of regularization on X (compressed dataframe)

Range from 0 to infinite

Type of regularization on X

L1 (sparse) or L2 (denoised)

Amount of regularization on Y (feature weightings to define archetypes)

Again 0 to infinite

Type of regularization on Y

L1 (sparse) or L2 (denoised)

These can be optimized using a training/test split (e.g. 80/20) or cross-validation

16 of 46

First round of hyperparameter tuning: 300,000

Default hyperparameters with 19 components: 787,000

17 of 46

3rd round of hyperparameter tuning: 2,000

18 of 46

Optimal settings for missing value imputation

k = 50 (versus 57 original covariates)
Regularization amount for X: 4
Regularization type for X: L2 (dense denoising)
Regularization amount for Y: 24
Regularization type for Y: L1 (sparse denoising)

19 of 46

Imputation error comparison: GLRM vs. Median/Mode

20 of 46

GLRM: examining cumulative variance explained

21 of 46

Machine Learning

22 of 46

Decision tree baseline (6 leaf nodes)

Given class imbalance, it was critical to re-weight the loss matrix for inverse class proportion.

Alternatively one could use observation weights, but then the plots can't accurately show the proportion of dataset in each leaf node.

23 of 46

Performance metrics

Primary: area under the precision-recall curve (PR-AUC)

Precision: correctly predicted positive / predicted positive

TP / (TP + FP)
"positive predictive value"

Recall: correctly predicted positive / all positive observations

TP / (TP + FN)
"sensitivity" or "true positive rate"

Note that it does not include a true negative term (TN)
Calculate precision vs. recall at all possible thresholds to get PR-AUC

"average precision"

Baseline PR-AUC is the outcome average (percentage of positive cases)

Secondary: area under the ROC curve (ROC-AUC)

True positive rate: recall TP / (TP + FN)
False positive rate: 1 - specificity

FN / (TN + FN)

Baseline AUC is 0.5
With imbalanced data AUC will be high due to negative cases being classified correctly
Intuition: % of time that random positive case ranked higher than random negative case

precrec, cvAUC

24 of 46

Discrimination: PR-AUC

1.88%

25 of 46

Discrimination: ROC-AUC

26 of 46

ROC Curve

27 of 46

SuperLearner weight distribution

28 of 46

Random Forest plateau analysis

29 of 46

Calibration

30 of 46

Calibration: overall

31 of 46

Calibration: zoomed in and exponential scale

32 of 46

Variable importance

How influential or impactful is each variable on the predictive accuracy of our models?
For a given model, which variables are driving the predictions?
Methods

Correlation (shown in EDA)
Single decision tree
OLS
Random forest
XGBoost
To add: BART, vimp, permutation importance

33 of 46

Vimp: Decision Tree

vip, rpart

34 of 46

Vimp: OLS

vip

35 of 46

Vimp: Random Forest

vip, ranger

36 of 46

Vimp: XGBoost

vip, xgboost

37 of 46

Accumulated local effect plots

Show how covariate influences prediction
Examines change in prediction for a certain variable when it is adjusted over a small range
Ex: for observations with age between 50 and 60, what is the average prediction when we set all those patients to age 60 compared to age 50?
Improves on partial dependence plots, which make unrealistic assumption that features are not correlated
See Chapter 5 in Interpretable Machine Learning

38 of 46

ALE: troponin peak, age, EDACS

iml

39 of 46

ALE: troponin 3-hour, HEART, pulse

iml

40 of 46

Exploratory data analysis

Analyze key continuous variables
Density of variable separately for 0's vs. 1's
Smoothed risk across range of variable

Compare to baseline risk
Plot linear correlation

41 of 46

EDA: EDACS

42 of 46

EDACS: inefficient resolution of score

43 of 46

EDA: age

44 of 46

EDA: BMI

45 of 46

Correlation (Pearson)

ck37r::

vim_corr()

46 of 46

Thanks - any questions, comments, or feedback?

Twitter: @c3k

Website: ck37.com

GitHub: github.com/ck37