1 of 24

A Machine Learning Framework for Predicting Frequent Emergency Department Users �Using Claims Data

Summer (Xia) Hu

Margret Bjarnadottir

Sean Barnes

Bruce Golden

University of Maryland, College Park

1

POMS Conference

May 06, 2016, Orlando, Florida

2 of 24

Background: Frequent Emergency Department (ED) Usage

2

Patients

Providers

  • Higher risks
  • Prolonged wait times
  • Higher abandon rates
  • Higher rates of dissatisfaction
  • Higher rates of medical errors
  • Lower productivity and morale
  • Reduced ability responding to mass casualty incidents

Negative Effects – ED Overcrowding

Insurance Company

  • Much higher payment

(compared with regular

PC visits)

High ED Usage put stress on the

ED system as well as the payer!

3 of 24

Background: Frequent ED Users & ED Jumpers

3

  • Frequent ED users constitute 21% of the member population
  • Yet they account for 78% of ED visits

Frequent ED Users

(members with ≥ 4 ED

visits in a single year)

  • Types of ED Users

Non-frequent ED Users

(members with < 4 ED

visits in a single year)

  • Facts from Claims Data

ED Jumpers

Year 1 Year 2

4 of 24

Objective

4

  • Predict potential frequent ED users and ED Jumpers

based on their claims records from the previous year.

  • Identify characteristics of frequent users and Jumpers.

5 of 24

5

Method Overview

  • User Segmentation

User segmentation via Clustering

  • Prediction – Frequent ED Users and ED Jumpers
  • Descriptive Analysis

6 of 24

Data & Preprocessing

  • Raw Data Summary

  • Data Processing:

6

Eligible Enrollment

(≥350 days of enrollment for

2 consecutive years)

Patient-based Yearly Profiles

Information

Uniqueness

(unique gender? Birth year?)

Feature Extraction

Claim Aggregation

Med

Pharmacy

Lab

MH

Dental

  • Five datasets
  • Four years (Jan 09 – Dec. 12)

– Transform claim-based records to patient-based yearly profiles

Claim-based Raw Data

7 of 24

  • Result in 164,402 member files, 439 features per year

7

Datasets

Feature (from observation year)

Profile

Member masked ID

Sex, Age, Birth Year

Profile year

Years of consecutive enrollment

Dental

Number of dental visits

Total number of unique dental providers (Top 20 CCS)

Total number of unique dental visits

Mental

Health

Total number of unique MH providers

Number of visits per MH disease

Number of MH visits divided by number of unique MH providers

Indicator of any mental visits

Pharmacy

Number of different pharmacies

Number of unique medications

Total days of medication supply

Total days of opium medication supply

Datasets

Feature (based on the observation year)

ED

Number of ED visits

ED intensity group

Number of different ED complaints (Based on the CCS Category )

Number of different ED vendors

Number of ED visits divided by number of ED vendors

Indicator of any mental health ED visits

Number of mental health ED visits

Number of ED visits per general diagnosis group (19 variables )

Number of ED visits per CCS diagnose group (287 variables )

NYU ED usage probability (9 variables )

Medical

Number of different chronic diseases (Based on CCS Category )

Number of unique chronic visits

Number of visits per chronic disease (100 chronic diseases )

Number of outpatient visits

Number of inpatient visits

Number of primary care visits

Final Data

  • Feature Overview
  • Number of Result Year’s ED Visits
  • On average most members have very few ED visits.

8 of 24

Descriptive Analysis:�Influence of Number of ED Visits from Observation Year

 

Result Year

Observation Year

67 %

12 %

(Jumper)

12 %

9 %

Total:

79%

21%

  • Linear trend between the number of ED visits in two consecutive years
  • Number of ED Visits from Observation year, as expected, explains Result Year’s ED visits the best

Among frequent ED users, 43% of them stay as frequent ED users in the Result Year

9 of 24

9

Predict Frequent ED Users: Supervised Machine Learning

  • Data Setup
  • Unbalanced Binary Classification Problem: (21%) (79%)
  • Six machine learning algorithms:

– Logistic regression (with or without regularization), Naïve Bayes, Decision Tree (CART),

Boosted Tree (C5.0), Random Forest.

Data

Frequent/ Non-frequent ED User

in 2nd Year?

439 Features from 1st Year for each eligible member

Predict

X

Y

Training Set (70%)

Validation Set (15%)

Test Set

(15%)

Data Partition

  • Supervised Machine Learning

10 of 24

10

Predict Frequent ED Users

Best Machine Learning Model (C5.0) v.s. Baseline Model

Models:

    • Baseline: Use the number of ED visits from the observation year as estimation for next year’s number of ED visit
    • Six machine learning models

Performance Metric:

    • Detection Accuracy rates

– percentage of correct predictions of 2nd Year top ED users selected by each model on unseen validation set

  • The best machine learning model improved detection accuracy of Frequent ED Users by 7.5%

11 of 24

11

Top Influential Features from Machine Learning

in Predicting Frequent ED Users

12 of 24

12

Predict ED Jumpers

  • ED Jumpers

Definition

Non

Jumper

Jumper

Observation Year (1st Year)

Result Year

(2nd Year)

Percentage

84%

16%

  • Prediction
  • The best machine learning model improved detection accuracy of frequent ED Users by 53%

Machine Learning Models:

LDA, MDA, QDA, FDA, RDA

Baseline Model: Use

1) # of ED claims

2) # of outpatient claims from the 1st year

as estimation for next year’s number of ED visit.

13 of 24

13

ED Jumpers: User Segmentation via Clustering

  • Clustering among ED Jumpers
    • Mixture model clustering (based on BIC) on Training set

Summary of ED Jumper Clusters

Cluster

1

2

3

4

5

6

Population

1518

1901

600

9956

283

20

Main Character

Healthy male who care about teeth

Elderly sick people

People with mental health problems

Relative healthy young people

ED abuser/ high medical users

Old alcohol user

Jumping Level

Median

Median

Median

Median

Low

High

Summary

High percentage of male (50%);

Go to dental most frequently;

Least number of chronic diseases, days of supply for medicine, and outpatient claims.

Oldest

Highest number of chronic diseases

Longest days of supply for medicine

Least percentage of preventable ED visits;

Largest number of MH claims.

Youngest;

Small number of medicine supplies

No mental issue.

Highest percentage of preventable

ED visits;

Largest number of chronic claims.

Least number of dental claims;

Largest percentage of alcohol-related ED visits

Relatively old.

14 of 24

14

Feature Summary of ED Jumpers: User Segmentation via Clustering

Cluster 1

Cluster 6

15 of 24

Future Work

  • Two-year Analysis (Current Focus):
  • Analyze thresholds that define frequent ED users

– How do the coefficients and performance change as a function of the thresholds?

  • Explore the relationship of mental health utilization on ED utilization and other outcome variables

  • Multi-year Analysis (Next Step):
  • Create multi-year models

– use multiple years of claims history to predict future ED usage levels

  • Analyze the influence of the allowable gap on the prediction performance

– How accurately can we predict ED performance with only partial-year enrollment?

  • Explore the “stickiness” of frequent ED use

– How consistent are frequent ED users from one year to the next?

15

16 of 24

Summer (Xia) Hu

University of Maryland, College Park

xhu64@umd.edu

16

Questions and Comments?

17 of 24

Appendix

17

i) General diagnosis category

Each ED visit

iii) Clinical Classifications Software (CCS)

ii) NYU ED Visit

Severity Algorithm

Three grouping algorithms

Preventable

ED Visits

Non-preventable

ED Visits

Based on historical

probability distribution

Others

Visit types

  • Non-Emergent
  • Primary care treatable
  • ED needed yet preventable
  • MH-related
  • Alcohol-related
  • Substance abuse-related
  • Injury
  • Unclassified

– Classify reasons for each ED visit by three grouping algorithms

Example of feature extraction

18 of 24

Appendix

Conclusions

Age

Group

Description

0

Most likely to become (> 40%)

0-10

Chances of becoming

decreases with age

8-13

Least likely to become ( 10%)

13-23

Chances of becoming

increases with age

23-65

Chances of becoming

decreases with age

65+

[Insufficient sample size]

Influence of Age on Frequent ED Users

19 of 24

Appendix

Influence of Age on Super ED Users

Conclusions:

Age Group

Description

0-10

Chances of becoming

decreases with age

4-16

Least likely to become ( ~0.1%)

16-23

Chances of becoming

increases with age

23-65

Chances of becoming

decreases with age

65+

[Insufficient sample size]

20 of 24

20

Top N Users

Picked By Models

Top Percentage Picked By Models

Improvement on Detection Accuracy

(Best Model v.s. Baseline)

250

1%

7.5%

500

2%

6.8%

1000

4%

4.4%

1500

6%

3.2%

2000

8%

1.7%

3000

12%

3.8%

4000

16%

4.1%

5000

20%

4.4%

Total Validation Size: 24,660

100%

NA

Appendix

Machine Learning Result – Frequent ED User

21 of 24

Appendix

Key Characteristics

      • Have 48% more outpatient services (>7.5)
      • Have on average 18% more ED visits (>1.15)

      • ED Visit complaints relate to:
        • Alcohol, drugs, mental health, headaches, the respiratory system, chest pain, disorder of the teeth and jaw, skin infections
        • Joint disorders, sprains and strains, abdominal pain, injuries, spondylosis, and other back problems

      • In General Sicker with chronic diseases such as:
        • Thyroid disorders, acute cerebrovascular
        • Disease, spondylosis, and other back problems

      • Pay more visits to MH doctors - for mood disorders and schizophrenia
      • Fill larger prescriptions for opium-related medicines
      • Frequent EDs for dental complaints but have low dental service utilization
      • On average > 9 Lab tests completed
      • On average have more than 2 unique medication

Gender: Female

Age: older

  • Machine learning algorithms did not perform very well in detecting jumpers

Definition

Non

Jumper

Jumper

Super

Jumper

Observation Year

Result

Year

Percentage

84%

16%

0.16%

Mean

20

23

31

Std

16

16

13

Mean

0.13

0.27

0.78

Std

0.77

1.16

2.56

22 of 24

22

Appendix

ED Jumpers

23 of 24

23

Appendix

ED Jumpers (continues)

24 of 24

24

Feature Summary of ED Jumpers: User Segmentation via Clustering