1 of 40

Machine Learning

Mentors: Professor Nikola Banovic and Anindya Das Antar

2 of 40

The Challenge

Developing methods for patient specific predictions of in-hospital mortality

Information provided from first 48 hours of an ICU stay
12,000 patients from community hospital in Massachusetts
6 general descriptor variables (Ex. Age, Weight, ICU type)
36 time series variables, taken at least once (Ex. Heart Rate, BUN, Glucose)

The timely and accurate detection of people at risk can save lives!

2

3 of 40

Abbreviations

HMM - Hidden Markov Model
SVM - Support Vector Machine
CNN - Convolutional Neural Networks
LSTM - Long Short Term Memory
LR - Logistic Regression
GNB - Gaussian Naive Bayes
RF - Random Forest

3

4 of 40

Metric definitions

4

TN

FP

FN

TP

Predicted

Dead

Alive

Actual

Alive

Dead

Precision: TP/(TP+FP)

Fraction of predicted positives that were actually positive

Recall: TP/(TP+FN)

Fraction of actual positives that were correctly predicted

Accuracy: (TP+TN)/(TP+TN+FP+FN)

Fraction of predictions that were correctly classified

F1 Score: Harmonic mean of Precision and Recall
AUC - ROC: Area under the ROC curve

5 of 40

Group 1: MaSH

Markov model, SVM Hybrid

Anvit Garg, Alejandra Solis Sala, Ian Maywar, Rhea Verma

6 of 40

Group 1: Predicting ICU Mortality using HMMs and SVM

Anvit Garg, Alejandra Solis Sala, Ian Maywar, Rhea Verma

7 of 40

Model Illustration

Model Architecture

7

8 of 40

Feature selection and preprocessing

We chose features based on availability, SAPS-2, and variation among dead and alive patients

Heart rate
NI systolic blood pressure
Creatinine
Na
Glascow coma score

Preprocessing

Forward imputation
Regression (ICU Type, age)
Time window transformation

8

Blood urea nitrogen
Urine output
Bicarbonate (HCO3)
K
Glucose

9 of 40

Models (background)

HMM:

Follows the Markovian property (next state depends only on current state) between unobservable hidden states
Allow us to work with time-series data
Predict path between hidden states via observable properties
Gaussian suitable: model data that is continuous

SVM:

Classifier that separates data points using a hyperplane in an n-dimensional space

9

Hidden

Observed

10 of 40

Models

HMM

2 models

One trained on alive patients, one trained on dead

Bayesian factor taken for each model for a given patient and used in SVM

SVM

Static variables (age, ICU-Type, weight) of patient used with bayesian factors to classify patient as dead or alive during ICU stay

10

11 of 40

Cross Validation

Hyperparameter tuning

Time window size, number of mixtures for both HMMs, regularization parameter in SVM

11

SVM

HMM_dead

HMM_alive

time window transformation

12 of 40

Full Model Evaluation

F1 Score: 0.42
Recall: 0.78
Accuracy: 0.70
Precision: 0.29
AUC-ROC: 0.73

12

1187

535

62

216

Predicted

Dead

Alive

Actual

Alive

Dead

13 of 40

Conclusions and Future Work

With more time, further developments can be made on the model:

Feature engineering
Testing more feature divisions
Wider hyperparameter tuning
More Error Analysis
Updating as more information is received

While there were limitations in the dataset due to missing values, results may still inform future models that can allow for augmented resource allocation and confirmation of clinical decisions.

13

14 of 40

Thank You

Any questions?

Contact Information:

14

Anvit Garg anvit25@gmail.com	Alejandra Solis Sala alejandra.solis@cimat.mx
Ian Maywar ijmaywar@gmail.com	Rhea Verma rhv3@pitt.edu

15 of 40

References

Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. E215–e220.
https://www.programmersought.com/article/19492506989/
http://gregorygundersen.com/blog/2020/11/28/hmms/

15

16 of 40

MACHINE LEARNING �GROUP 2

Felicia Zhang

Madeline J Peterson

Maya Nitsche Taylor

16

Mentors: Professor Nikola Banovic and Anindya Das Antar

17 of 40

The Model

17

Canva. (n.d.). Donut Decision Maker. Design a Superb Decision Tree Online with Canva. https://www.canva.com/graphs/decision-trees/ .

Yiu, T. (2019, August 14). Understanding random forest. Medium. https://towardsdatascience.com/understanding-random-forest-58381e0602d2.

18 of 40

Why Random Forests?

Shown good performance across domains
One of the best supervised machine learning algorithms
Easy to evaluate feature importance
Explore trees which can aid explainability and interpretability

18

The goal: provide clinicians and hospitals with a bigger picture of patient survival in the ICU that may help with large-scale vision

The challenge: applying random forest to time-series data when the algorithm is not meant for time-series data

19 of 40

Methods

Examine and split the data

Development

Feature selection
Window selection

Training

Hyperparameter selection
Comparison to other ML models

Testing

Performance stats

19

10,000 patients

development

2500 samples

training

5000 samples

testing

2500 samples

48 hours

8 hours

window and feature selection

summarize over each window

mean
median
25th quantile
75th quantile
standard deviation
min
max
count

20 of 40

For a given time-series variable:

Understanding Time Windows

Ex: use 6 windows: hours 1-8, 8-16, 16-24, 25-30, 30-36, 36-48
Ex. GCS (glascow coma score, a measure of a patient’s consciousness)

A patient might have 20 GCS measurements over the 48 hour period.
To summarize with six windows, we split these 20 measurements into their respective time block and summarize over each block with various measurements (for ex, mean)
The data might look like: meanGCS1, meanGCS2,..., meanGCS6, medianGCS1,..., stdGCS1,..., minGCS1,..., maxGCS1,..., etc.
Also included a general 48 hour measurement for each statistic - for example, meanGCS (the mean of the patient’s GCS score over all 48 hours of the ICU visit)

20

21 of 40

To use random forest, the time-series data must be collapsed over time windows to create 2D feature vectors:

Development - Choosing Window Size

Ran cross-validation on the development set using various window sizes, optimizing on the f1 score produced by the random forest.

Based on this analysis we decided to move forward with six windows.

21

22 of 40

In parallel with window selection, we also chose a set of summary statistics for best performance (using the development data set).

Development - Choosing Summary Statistics

Set 1: mean, min, max

Set 2: mean, min, max, median, std deviation

Set 3: mean, min, max, median, std deviation, 25th and 75th quantiles, count

22

23 of 40

Methods

Examine and split the data

Development

Feature selection
Window selection

Training

Hyperparameter selection
Comparison to other ML models

Testing

Performance stats

23

Max Features

Final: 100

N Estimators

Final: 1000

Others:

Bootstrap: True

Class Weight: Balanced

Max Depth: 4

24 of 40

Methods

Examine and split the data

Development

Feature selection
Window selection

Training

Hyperparameter selection
Comparison to other ML models

Testing

Performance stats

24

Comparison to other ML Models

F1 score

25 of 40

25

Final Metrics on the Testing Set

Note:

-1 = survival

1 = death

Important takeaways:

74% of deaths were correctly predicted
77% of survivals were correctly predicted

Final Values on test data

Precision: 0.377

F1 Score: 0.501

Recall: 0.749

Confusion Matrix

26 of 40

26

Receiver Operating Characteristic (ROC) Curves

27 of 40

27

Understanding the Model - Example Decision Tree

Top 10 important features:

GCSmedian5, GCSquant755, GCSmax5, GCSmean5, GCSquant255, GCSmin5, GCSmedian4, GCSmean4, mean_Urine, quant25_BUN

Note: the number at the end denotes the window. GCS (Glasgow coma scale) quantifies degree of consciousness.

28 of 40

28

Takeaways and Future Work

Takeaways

Our proposed model outperforms the other models
Able to incorporate time series data
Understanding what features are important

Future Work

Adding more features
Generalizability
Predicting if someone will return to the ICU

29 of 40

29

References

Canva. (n.d.). Donut Decision Maker. Design a Superb Decision Tree Online with Canva. https://www.canva.com/graphs/decision-trees/ .

Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. E215–e220.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (1970, January 1). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html.

Yiu, T. (2019, August 14). Understanding random forest. Medium. https://towardsdatascience.com/understanding-random-forest-58381e0602d2.

30 of 40

Thank you!

Felicia Zhang, University of Michigan - fyzhang@umich.edu

Madeline Peterson, Albion College - mjp12@albion.edu

Maya Taylor, Brown University - maya_taylor@brown.edu

30

31 of 40

Long Short-Term Memory (LSTM) neural network to predict ICU mortality

By: Rami Shams, Sabir Meah, Esther Adegoke, Brian Lin

32 of 40

The Promise of LSTMs

Neural Networks - Deep Learning capable of identifying features

Recurrent Neural Networks - Neural Network that considers the output of the last computation when computing next input (Short Term Memory)

LSTMs - Recurrent Neural Network capable of storing information in an internal memory (Long Term Memory)

Neural Network

RNN

LSTM

33 of 40

Raw Data

t = 1 Hour

L = 48 Hours

Cubic Spline Interpolation

≥ 3 measurements per feature

Forward and back filling

≥ 1 measurements per feature

Mean Imputation

0 measurements per feature

Z-Score standardization

Data Loader

Batch Size = 64

Time Variant(35)

Time Invariant(8)

Data Preprocessing and Missing Data approaches

10000 Patients

__________________

8000 Train

1000 Validation

1000 Test

Mean aggregation by time

Stratify by outcome label

34 of 40

Time Variant data(35)

Time Invariant data(8)

LSTM(40)

Hidden layer ∈ R⁴⁸

Hidden layer ∈ R⁴⁰

Internal state

Sigmoid output

t = 48

LSTM Architecture

ReLU

35 of 40

Methods - Data cleaning

Aggregate data over time

Time window trade-off

Smaller time windows include more granular data
Bigger time windows decrease prevalence of missing data

Imputation

Mean imputation
Representing missing data with an identifier value failed to converge

Standardization

Z - Score
MinMax scaling yielded poor results

Stratification

ICU Type
Mortality

ICU	CCU	CSRU	MI	SI
Mortality	0.13	0.05	0.20	0.15
Num of Patients (N = 10000)	1476	2076	3609	2839

36 of 40

Methods - Hyperparameters

Choice of optimizer

Adam optimizer had a strange loss behavior (pictured above)
Stochastic gradient descent showed more stable loss curve

Learning rate

Initial learning rate (0.001) was too slow with gradient descent (pictured below), model never converged
Learning rate of 0.01 converged

Loss function

Used binary cross entropy
Failed to get good F1 scores from other loss functions

Size of hidden layer

Found best results with a size of 40

Loss with Adam Optimizer

Loss with Gradient Descent Optimizer

37 of 40

Results - Final Model Performance

~.5 validation F1
Test Statistics

.769 AUROC
.854 Accuracy
.488 Precision
.420 Recall
.451 F1 Score

Loss in Final Model

F1 Score in Final Model

38 of 40

Discussion

Efficacy

Strong improvement over our group’s baseline SVM model
Efficacy may come at cost of explainability
Model still far from perfect - can inform a doctor but not replace them

Limitations

The hyperparameters (hidden size, optimizer, learning rate) that were tuned did not significantly improve the results
Lack of time to tune other hyperparameters (batch size, window size, etc.)

Future Research

Fine-tune hyperparameters
We want to try additional imputation and interpolation strategies
Bidirectional LSTMs - improved performance in papers about this dataset
Explore other machine learning libraries for additional options

39 of 40

References

Antar AD., Banovic N. BDSI Machine Learning Github Tutorials
Olah, C. (2015, August 27). Understanding LSTM Networks. In Colah's Blog. Retrieved from https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Wikipedia. Machine Learning. Retrieved from https://en.wikipedia.org/wiki/Machine_learning
Yasrab, R., & Pound, M. P. (2020). PhenomNet: Bridging phenotype-genotype gap: A CNN-LSTM based automatic plant root anatomization system. bioRxiv.
Zhu, Y., Fan, X., Wu, J., Liu, X., Shi, J., & Wang, C. (2018, January). Predicting ICU mortality by supervised bidirectional LSTM networks. In AIH@ IJCAI.

40 of 40

Contacts

Brian Lin - Carnegie Mellon University

blin2@andrew.cmu.edu

Sabir Meah - University of Michigan

smeah@umich.edu

Rami Shams - University of Michigan

rshams@umich.edu

Esther Adegoke - Tufts University

esther.adegoke@tufts.edu