Generating Synthetic Molecular Mediator Time Series Data for ML and AI applications: considerations
Gary An, MD, FACS
Department of Surgery
University of Vermont
August 24, 2023
Viral Pandemics Working Group Webinar Seminar Series
◤
What this Talk is about
◤
Synthetic Data: The key to modern ML/AI
◤
”Classical” Approaches Part 1: Statistical Approaches
Statistical synthetic data can be generated if there is enough existing real-world data such that either:
However, for GAN to work => needs to be sufficient training data such that a GAN can distinguish between applied noise and the “true” data/invariant component (Ground Truth).
Consequently, this approach has found its greatest success in image analysis, where vast libraries of annotated images have been able to be used to both train initial ANNs and serve as reference points for GAN-driven synthetic image generation.
◤
”Classical” Approaches Part 2: �Physics-based Approaches
Simulation generated synthetic data from mechanism-based simulation models
In these cases, there is a high degree of confidence in the rules and mechanisms of the simulations, and thus high trust in the fidelity of the synthetically generated data and the “real-world” in which the trained systems must operate (acknowledging that in the case of a game, the game itself represent the “real world” for the player).
◤
Synthetic Data in Healthcare/Biomedical Modeling: What works
Statistical/GAN
Physics-Based
◤
Synthetic Data Molecular Time Series Data: why it is wanted
◤
Why Statistical Methods won’t work
◤
Why Physics-based Methods won’t work
◤
What features must SMT have?�Understanding limits of AI
◤
◤
What must be dealt with when generating SMT?
◤
A proposed method for generating SMT
◤
Capturing Clinical Heterogeneity = Parameter Landscape => Model Rule Matrix (MRM)
◤
Model Rule Matrix (MRM)
List of Entities (Molecules) in Model:
List of Rules:
| Entity 1 | Entity 2 | Entity 3 | Entity 4 | Entity 5 |
Rule 1 | | | | | |
Rule 2 | | | | | |
Rule 3 | | | | | |
Rule 4 | | | | | |
Entity n |
|
|
|
|
Rule m | | | | | |
|
◤
Digital Twin to generate Synthetic Multiplexed Molecular Time Series
◤
Results: Sample MRMs from IIRABM
“Base” IIRABM MRM
“An” Evolved IIRABM MRM
IIRABM-MRM = 17 Mediators (Columns) x 25 Rules (Rows
Cockrell C and An G: Frontiers in Physiology: Computational Physiology and Medicine. 2021
◤
Results: Range of Ensemble MRMs
2d Heatmap of Value Ranges
3d Depiction of Value Ranges
Cockrell C and An G: Frontiers in Physiology: Computational Physiology and Medicine. 2021
◤
Results: Mechanistically Generated Synthetic Trajectory Spaces
T
TNFa: Real and Synthetic Data
◤
Results: Mechanistically Generated Synthetic Trajectory Spaces => Link to Organ Scale
T
GCSF, IL-1 and Lung SOFA Scores
◤
Benefits of Using this approach MSM to generate SMT
◤
Next Steps
◤
���Finis
◤