Heavy Metals
Exposure and Diabetes
Risk Prediction using Machine Learning Approach
Sagar Shrestha, Felix Twum, Jennifer L. Lemacks, Sermin Aras
April 3rd, 2025
Susan A. Siltanen Graduate Student Research Symposium
The University of Southern Mississippi
Introduction
Risk factors
Traditional Risk
Environmental factors
Motivation of the Study
Formative Study - OLS Regression Analysis
Objectives
Literature review
Data Preprocessing
Dataset
Data Source: NHANES 2011-2018, initially consisting of 51,122 participants.
Dataset Size: 13667 survey responses
Original NHANES Data: 51,122 responses and 330 variables
Dataset Histogram
Dataset Histogram
Data Filtering
Data Imbalance:
Feature Standardization:
Key Variables
Downsampling and Stratified K fold Cross Validation
Principal component Analysis
ML Models
CatBoost
LightGBM
Random Forest
Feedforward Neural Network (FNN)
Results for Risk Assessment
Metric | Random Forest | CatBoost | LightGBM | FNN |
Accuracy | 0.7166 | 0.7285 | 0.7163 | 0.668 |
Precision | 0.6999 | 0.707 | 0.6985 | 0.6539 |
Recall | 0.7596 | 0.7811 | 0.7626 | 0.7159 |
F1 | 0.7283 | 0.7421 | 0.729 | 0.6832 |
F2 | 0.7467 | 0.765 | 0.7487 | 0.7024 |
Feature importance for Cat Boost
Feature importance for Cat Boost
Conclusions
Heavy metal exposure, particularly lead, is strongly associated with diabetes risk.
CatBoost is the most effective model for predicting diabetes based on environmental and physiological factors.
These findings underscore the need for stricter environmental regulations and further research into heavy metal toxicity and diseases.
1
2
3
Future Work
Exploring additional environmental and genetic factors
Implementing deep learning models to improve predictive accuracy.
Conducting longitudinal studies to establish causal relationships.
1
2
3
References
“A report card: Diabetes in the United States infographic,” Diabetes, May 15, 2024. https://www.cdc.gov/diabetes/communication-resources/diabetes-statistics.html
American Diabetes Association. (2023). Annual Report 2023. Retrieved from https://diabetes.org/sites/default/files/2024-06/ADA_2023_AnnualReport.pdf
Zhao, M., Wan, J., Qin, W., Huang, X., Chen, G., & Zhao, X. (2023). A machine learning-based diagnosis modeling of type 2 diabetes mellitus with environmental metal exposure. Computer Methods and Programs in Biomedicine, 107537. https://doi.org/10.1016/j.cmpb.2023.107537
Gui, Y., Gui, S., Wang, X., Li, Y., Xu, Y., & Zhang, J. (2024). Exploring the relationship between heavy metals and diabetic retinopathy: a machine learning modeling approach. Scientific Reports, 14, 13049.
Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey (NHANES), 2011-2018. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention. Available from: https://www.cdc.gov/nchs/nhanes/index.htm
Thanks