Covid-19 Severe Outcome Risk Prediction
Changrong Ji
Dr. Mahesh Shukla
Dr. David Patton
Dr. Xue Yang
Dr. Xingguo Zhang
Antonio Linari
Premdutt Gaur
Vance Degen
Private Machine Learning on Medical Records & Social Data
1
Topics
2
3
Nonprofit Applied R&D
We are:
Data Scientists
Physicians
Engineers
Advocates
Researchers
Privacy Specialists
Game Developer
Venture & Social Capitalist
4
Projects
5
The data, technology, and services used in the generation of these research findings were generously supplied pro bono by the COVID-19 Research Database partners
https://covid19researchdatabase.org/
COVID-19 Project Team
Aims
6
Multiple research aims are addressed in detail in the following working papers respectively as of 09/2020. Future versions will be published as the research progresses:
7
Data
As of 08/21/2020, with new data added with 1 week delay
8
Attributions to Data Providers
AnalyticsIQ is s a leading predictive data and analytics innovator that leverages a blend of publicly available data and custom algorithms informed by cognitive psychology concepts to describe consumers across three areas - People, Behaviors, and Predictors. Headquartered in Atlanta and recently named one of Georgia’s Top 10 most innovative companies, AnalyticsIQ’s team of data analysts, scientists, and cognitive psychologists have over 100 years of collective analytical experience and expertise.
Electronic Health Record data including diagnosis, procedures, labs, vitals, medications and histories sourced from participating members of the Healthjump network.
9
Machine Learning for Clinical Prognosis
10
Aim 1
Create machine learning models to predict a patient’s risk of severe clinical outcomes if infected with COVID-19.
These personalized risk scores and associated risk factors analysis can
11
ML Model Development
12
Feature Engineering
13
Embedding in NLP
14
Lower-dimensional space:
Clinical Concepts Embedding
15
Clinical Concepts Embedding
16
17
15.7%
6.5%
0.4%
0.01%
3.1%
Top 20 Procedures for COVID Patients
18
Top 20 Co-occurring Diagnosis with COVID-19
19
Baseline Prediction with Claims Data
Goal: Prediction of hospitalization for COVID-19 patient
Data:
Model:
algorithm, model performance, feature importance, future improvements
20
Precision & Recall Refresher
21
16%
Actually
Hospitalized
84%
Actually Not Hospitalized
Predicted Hospitalized
TP
5026
FP
5531
FN
775
TN
22727
48%
87%
Minimize
Hospitalization Results
Classification Report:
22
23
Social Determinants and Risk Factors
About 90 attributes of social data from Analytics IQ are available for over 34 million patients. Over 95,000 are COVID-19 patients.
We examined:
24
Occupation*
25
Specific Occupations
26
Ethnicity
27
Ethnicity
28
Ethnicity
29
Ethnicity
30
Assimilation into US Culture
31
Least likely Most likely
Future Work: SDOH
Many additional attributes available in the dataset:
BMI, Diet, Location, Profession, Access to Healthcare, Risky behavior, etc
32
Future Work: Current Use Case
33
Future Work: Population Health Dashboard
34
Future Work: New Use Cases As the Pandemic Progresses
35
Private AI
Collaborative
Learning with
Obfuscation
Aggregation &
Knowledge Transfer
Data Sharing & Healthcare AI Challenges
Specific relevance to the COVID-19 Research DB projects: The personal level medical records and social data, while de-identified, are still vulnerable to attacks such as data linkage and model inversion that leaks private information. The following highlights a set of techniques to mitigate the privacy risks. A series of papers will be published on this topic. Starting with: Privacy-Preserving Machine Learning Techniques 2020, Changrong Ji et al.
��
37
Copyright © 2020 Changrong Ji
CLOAK PLATFORM (work in progress)
Copyright © 2020 Changrong Ji
ANALYST
DATA
THREATS
TRUST
COMPUTE
Copyright © 2020 Changrong Ji
PRIVACY PRESERVING TECHNIQUES 2020
Copyright © 2020 Changrong Ji
ARCHITECT
DESIGN TRADE-OFFS
EXAMPLES
Copyright © 2020 Changrong Ji
BUILDER