Ernest Chianumba
Graduate Researcher & Master’s Student,
Data Science, Montclair State University.
Optimizing Large Language Models for ICU Readmission Prediction
A Bristol Myers Squibb Science Scholars Initiative
Clinical datasets often reflect biases that favor majority populations, leading to predictive models that inadequately serve underrepresented groups such as Black and Hispanic populations. This exacerbates healthcare disparities and limits equitable outcomes.
This project focuses on optimizing Large Language Models (LLMs) to improve predictions for ICU readmission, using clinical trial demographic data combined with medical publications. By leveraging advanced machine learning techniques, we aim to mitigate bias, enhance personalization, and promote equity in critical healthcare decisions.
Aligned with Bristol Myers Squibb’s mission to transform lives through science, this work contributes to reducing disparities and improving outcomes for underrepresented populations in ICU care.
INTRODUCTION
PRESENTATION OVERVIEW
START
System Flowchart
01
02
Data Preprocessing
03
Model Development
PRESENTATION OVERVIEW
Predictive Analyses
04
05
Results Explanations
06
Limitations & Conclusion
END
System Flowchart
Data Sources:
Focus Areas:
Core Activities:
Impact:
Core Activities:
Impact:
Data Collection
Preprocessing
Model Training
Database Tables
PostgreSQL Database schema
Sample SQL Querying
Black – 1,105
Filtering publications that referenced clinical trials with at least 50% Black/White/Hispanic participants out of 19,707 Abstracts extracted
Hispanic – 579
White – 5,935
Sample SQL Querying
Filtering clinical trials with at least 25% Blacks & 25% Hispanics participants. – 11 Trials out of 23,370
BERT Models’ Token Prediction Performances
BERT Models’ Training
Abstracts count before and after Augmentation and Oversampling to ensure equal number of demographic-specific datasets for fine-tuning BERT models.
Prediction Models Used and Why:
Predictive Models’ Train/Test Data From MIMIC-IV
12,088 Blacks-filtered Subset | 2,577 Hispanics-filtered Subset | 44,584 Whites-filtered Subset |
TRAINING FEATURES | |
|
|
|
ICU Readmission |
Abstracts Data From PubMed for Fine-Tuning LLMs
BASELINE PubMedBERT MODEL | DEMOGRAPHIC PubMedBERT MODELS | ||
40,104 Balanced Patient ICU Records | 13,368 Blacks-filtered Abstracts | 13,368 Hispanics-filtered Abstracts | 13,368 Whites-filtered Abstracts |
Training Features
Predicted Feature
Predictive Models’ Performances on Blacks-Filtered Subset
Model | Baseline-Acc | Black-Acc | Δ Acc |
Logistic Regression | 0.75 | 0.70 | -0.05 |
LightGBM | 0.55 | 0.60 | +0.05 |
MLP | 0.60 | 0.75 | +0.15 |
Model | Baseline-AUC | Black-AUC | Δ Acc |
Logistic Regression | 0.74 | 0.75 | +0.01 |
LightGBM | 0.63 | 0.68 | +0.05 |
MLP | 0.70 | 0.81 | +0.11 |
Predictive Models’ Performances on Hispanics-Filtered Subset
Model | Baseline-Acc | Hispanic-Acc | Δ Acc |
Logistic Regression | 0.38 | 0.41 | +0.03 |
LightGBM | 0.85 | 0.86 | +0.01 |
MLP | 0.54 | 0.61 | +0.07 |
Model | Baseline-AUC | Hispanic-AUC | Δ Acc |
Logistic Regression | 0.35 | 0.37 | +0.02 |
LightGBM | 0.91 | 0.92 | +0.01 |
MLP | 0.59 | 0.68 | +0.09 |
Predictive Models’ Performances on Whites-Filtered Subset
Model | Baseline-Acc | White-Acc | Δ Acc |
Logistic Regression | 0.35 | 0.40 | +0.05 |
LightGBM | 0.35 | 0.50 | +0.15 |
MLP | 0.40 | 0.40 | 0.00 |
Model | Baseline-AUC | White-AUC | Δ Acc |
Logistic Regression | 0.24 | 0.39 | +0.15 |
LightGBM | 0.23 | 0.40 | +0.17 |
MLP | 0.28 | 0.43 | +0.15 |
Future Work:
Limitations:
FUTURE WORK AND LIMITAIONS
Conclusion:
CONCLUSION AND ACKNOWLEDGEMENT
Key Supporters:
Ready and excited to contribute to Bristol Myers Squibb's mission to discover, develop and deliver innovative medicines that help patients prevail over serious diseases.
Thank You!