Fair Machine Learning Models for Integrating EMRs and Neighbor Information for Better Disease Screening and Risk Factor Identification
Yang Dai
Department of Biomedical Engineering
Center for Bioinformatics and Quantitative Biology (CBQB)
University of Illinois Chicago
UIC AI Ecosystem Continuing Symposium 1, 9.13.2024, UIC
Machine learning predictive models for disease using multi-omics data �
Machine Learn
Disease prediction
Biomarker discovery
Biology
Characteristic of data:
Neither ML generalization nor bias are typically evaluated
ML Modeling for Wellbeing�~ better disease screening, risk factor identification�~ recommendation of changes in screening, health policy, and lifestyle
Machine Learn
Fairness (Race/Gender)
Bias Mitigation
Other social determinants of health (SDOH)
Risk prediction
Risk Factors (bio/socio-economic/environmental/behavioral)
…
ML Modeling for Wellbeing�~ better disease screening, risk factor identification�~ recommendation of changes in screening, health policy, and lifestyle
Machine Learn
EMRs
Neighborhood Information
Fairness (Race/Gender)
Bias Mitigation
Modeling
Dietary Records
Risk prediction
Risk Factors (bio/socio-economic/environmental/behavioral)
…
Case study 1: Prenatal depression
Hypothesis: Community-level information could improve PND prediction
Overall Patient distribution
Biases in prediction for different racial/ethnic groups
Prenatal depression: data from UI Health
(2414 patients, 56 EMR features)
Huang Y, Alvernaz S, Kim SJ, Maki P, Dai Y, Bernabé BP.
Predicting prenatal depression and assessing model bias using machine learning models. Biological Psychiatry Global Sciences, August 2024
Feature/risk importance are race/ethnicity-specific
Huang et al, Biological Psychiatry Global Sciences 2024
CONTROL
CASE
Other
Other
NHB
NHW
H
NHB
NHW
H
Outer Loop
K = 10
Inner Loop
K = 5
Proposed new training procedure:
*
*
*
A
B
C
D
Integration of Community-Level information with EMRs Enhances ML Model Fairness While Maintaining Moderate Predictive Performance
Motivation: Lung cancer exhibits health disparity in race and gender; existing risk factors could not fully explain the disease incidence
Spatial distributions of lung cancer risk, homicide rate, and racial/ethnic composition of the city of Chicago.
Case study 2: Lung cancer risk prediction
Kim SJ, Kery C, An J, Rineer J, Bobashev G, Matthews AK. Racial/Ethnic disparities in exposure to neighborhood violence and lung cancer risk in Chicago. Soc Sci Med. 2024 Jan;340:116448.
Objectives: Develop ML models to identify multi-level risk factors for lung cancer
Summary of data: 14 variables
Patient data: Age over 40; UI Hospital
Case study 2: Lung cancer risk prediction
Type | Variable Name | Values | Definition |
Categorical (8) | BMI | 1, 2, 3, 4 | Body Mass Index |
Male | 0,1 | Gender | |
Neversmoker | 0,1 | Smoking behavior | |
White Black Asian Hisp Other | 0,1 0,1 0,1 0,1 0,1 | Individual level race/ethnicity. use white, black hisp in the model. | |
Continuous (6) | age homiciderate1519 |
| age |
ppov |
| %poverty | |
pwhite pbalck phisp |
| %white %black %hispanics |
Performance shows racial and gender bias
The risk ranking
Blue: positive
Red: Negative
Question:
How does this change
In the different race/gender groups?�
Lessons learned from our studies and future directions
Learned
Moving forward for better ML strategies and procedures
Acknowledgments