�A Machine Learning Framework for Predicting Frequent Emergency Department Users �Using Claims Data
Summer (Xia) Hu
Margret Bjarnadottir
Sean Barnes
Bruce Golden
University of Maryland, College Park
1
POMS Conference
May 06, 2016, Orlando, Florida
Background: Frequent Emergency Department (ED) Usage�
2
Patients
Providers
Negative Effects – ED Overcrowding
Insurance Company
(compared with regular
PC visits)
High ED Usage put stress on the
ED system as well as the payer!
Background: Frequent ED Users & ED Jumpers
3
Frequent ED Users
(members with ≥ 4 ED
visits in a single year)
Non-frequent ED Users
(members with < 4 ED
visits in a single year)
ED Jumpers
Year 1 Year 2
Objective
4
based on their claims records from the previous year.
5
Method Overview
User segmentation via Clustering
Data & Preprocessing
6
Eligible Enrollment
(≥350 days of enrollment for
2 consecutive years)
Patient-based Yearly Profiles
Information
Uniqueness
(unique gender? Birth year?)
Feature Extraction
Claim Aggregation
Med
Pharmacy
Lab
MH
Dental
– Transform claim-based records to patient-based yearly profiles
Claim-based Raw Data
7
Datasets | Feature (from observation year) |
Profile | Member masked ID |
Sex, Age, Birth Year | |
Profile year | |
Years of consecutive enrollment | |
Dental | Number of dental visits |
Total number of unique dental providers (Top 20 CCS) | |
Total number of unique dental visits | |
Mental Health | Total number of unique MH providers |
Number of visits per MH disease | |
Number of MH visits divided by number of unique MH providers | |
Indicator of any mental visits | |
Pharmacy | Number of different pharmacies |
Number of unique medications | |
Total days of medication supply | |
Total days of opium medication supply |
Datasets | Feature (based on the observation year) |
ED | Number of ED visits |
ED intensity group | |
Number of different ED complaints (Based on the CCS Category ) | |
Number of different ED vendors | |
Number of ED visits divided by number of ED vendors | |
Indicator of any mental health ED visits | |
Number of mental health ED visits | |
Number of ED visits per general diagnosis group (19 variables ) | |
Number of ED visits per CCS diagnose group (287 variables ) | |
NYU ED usage probability (9 variables ) | |
Medical | Number of different chronic diseases (Based on CCS Category ) |
Number of unique chronic visits | |
Number of visits per chronic disease (100 chronic diseases ) | |
Number of outpatient visits | |
Number of inpatient visits | |
Number of primary care visits |
Final Data
Descriptive Analysis:�Influence of Number of ED Visits from Observation Year ��
| Result Year | |
Observation Year | | |
| 67 % | 12 % (Jumper) |
| 12 % | 9 % |
Total: | 79% | 21% |
Among frequent ED users, 43% of them stay as frequent ED users in the Result Year
9
Predict Frequent ED Users: Supervised Machine Learning
– Logistic regression (with or without regularization), Naïve Bayes, Decision Tree (CART),
Boosted Tree (C5.0), Random Forest.
Data
Frequent/ Non-frequent ED User
in 2nd Year?
439 Features from 1st Year for each eligible member
Predict
X
Y
Training Set (70%)
Validation Set (15%)
Test Set
(15%)
Data Partition
10
Predict Frequent ED Users
Best Machine Learning Model (C5.0) v.s. Baseline Model
Models:
Performance Metric:
– percentage of correct predictions of 2nd Year top ED users selected by each model on unseen validation set
11
Top Influential Features from Machine Learning
in Predicting Frequent ED Users
12
Predict ED Jumpers
Definition | Non Jumper | Jumper |
Observation Year (1st Year) | | |
Result Year (2nd Year) | | |
Percentage | 84% | 16% |
Machine Learning Models:
LDA, MDA, QDA, FDA, RDA
Baseline Model: Use
1) # of ED claims
2) # of outpatient claims from the 1st year
as estimation for next year’s number of ED visit.
13
ED Jumpers: User Segmentation via Clustering
Summary of ED Jumper Clusters
Cluster | 1 | 2 | 3 | 4 | 5 | 6 |
Population | 1518 | 1901 | 600 | 9956 | 283 | 20 |
Main Character | Healthy male who care about teeth | Elderly sick people | People with mental health problems | Relative healthy young people | ED abuser/ high medical users | Old alcohol user |
Jumping Level | Median | Median | Median | Median | Low | High |
Summary | High percentage of male (50%); Go to dental most frequently; Least number of chronic diseases, days of supply for medicine, and outpatient claims. | Oldest Highest number of chronic diseases Longest days of supply for medicine | Least percentage of preventable ED visits; Largest number of MH claims. | Youngest; Small number of medicine supplies No mental issue. | Highest percentage of preventable ED visits; Largest number of chronic claims. | Least number of dental claims; Largest percentage of alcohol-related ED visits Relatively old. |
14
Feature Summary of ED Jumpers: User Segmentation via Clustering
Cluster 1
Cluster 6
Future Work
– How do the coefficients and performance change as a function of the thresholds?
– use multiple years of claims history to predict future ED usage levels
– How accurately can we predict ED performance with only partial-year enrollment?
– How consistent are frequent ED users from one year to the next?
15
Summer (Xia) Hu
University of Maryland, College Park
xhu64@umd.edu
16
Questions and Comments?
Appendix
17
i) General diagnosis category
Each ED visit
iii) Clinical Classifications Software (CCS)
ii) NYU ED Visit
Severity Algorithm
Three grouping algorithms
Preventable
ED Visits
Non-preventable
ED Visits
Based on historical
probability distribution
Others
Visit types
– Classify reasons for each ED visit by three grouping algorithms
Example of feature extraction
Appendix
Conclusions | |
Age Group | Description |
0 | Most likely to become (> 40%) |
0-10 | Chances of becoming decreases with age |
8-13 | Least likely to become ( 10%) |
13-23 | Chances of becoming increases with age |
23-65 | Chances of becoming decreases with age |
65+ | [Insufficient sample size] |
Influence of Age on Frequent ED Users
Appendix
Influence of Age on Super ED Users
Conclusions:
Age Group | Description |
0-10 | Chances of becoming decreases with age |
4-16 | Least likely to become ( ~0.1%) |
16-23 | Chances of becoming increases with age |
23-65 | Chances of becoming decreases with age |
65+ | [Insufficient sample size] |
20
Top N Users Picked By Models | Top Percentage Picked By Models | Improvement on Detection Accuracy (Best Model v.s. Baseline) |
250 | 1% | 7.5% |
500 | 2% | 6.8% |
1000 | 4% | 4.4% |
1500 | 6% | 3.2% |
2000 | 8% | 1.7% |
3000 | 12% | 3.8% |
4000 | 16% | 4.1% |
5000 | 20% | 4.4% |
Total Validation Size: 24,660 | 100% | NA |
Appendix
Machine Learning Result – Frequent ED User
Appendix
Key Characteristics
Gender: Female
Age: older
Definition | Non Jumper | Jumper | Super Jumper |
Observation Year | | | |
Result Year | | | |
Percentage | 84% | 16% | 0.16% |
| | | |
Mean | 20 | 23 | 31 |
Std | 16 | 16 | 13 |
| | | |
Mean | 0.13 | 0.27 | 0.78 |
Std | 0.77 | 1.16 | 2.56 |
22
Appendix
ED Jumpers
23
Appendix
ED Jumpers (continues)
24
Feature Summary of ED Jumpers: User Segmentation via Clustering