1 of 14

Predictions using the MIMIC-III Dataset

Pallab Paul

2 of 14

What is the MIMIC-III Dataset?

MIMIC-III (Medical Information Mart for Intensive Care III) is a large, freely-available database

Contains about 40,000 critical care patients

Includes demographics, vital signs, laboratory tests, medications, and more

3 of 14

Severity Scores

ICU scoring systems were introduced almost 30 years ago with the goal of using physiologic data available at ICU admission to predict individual patient outcomes.
They provide a mechanism to assess ICU performance by comparing actual outcomes in a given population to the outcomes observed in the reference population used to develop the prediction algorithms.
Some of the most popular severity tests include SAPS, SAPS II, SOFA and OASIS.
The following slides have charts of the severity scores calculated from the patients in our dataset.

4 of 14

SOFA (Sepsis-related Organ Failure Assessment)

Each system’s result is given a score from 0–4 which causes the scores range to be from 0–24 with 0 being the least severe condition and 24 being the most severe condition and an average having >90% chance of mortality.

5 of 14

SAPS (Simplified Acute Physiology Score)

The variables chosen were present for 90% of patients in the initial survey used to develop the APS (Knaus, 1981). The higher the SAPS score, the higher the severity of the patient.

6 of 14

SAPS II (Simplified Acute Physiology Score II)

SAPS II ranges from a score of 0–163 with a score of 0 meaning 0% of mortality and 163 meaning 100% of mortality.

7 of 14

OASIS (Oxford Acute Severity of Illness Score)

OASIS was designed to have an extremely low burden for data collection and quality control, requiring only 10 features, and not requiring laboratory measurements, diagnosis or comorbidity information.

8 of 14

Selecting a Cohort

Before performing tests, a cohort, or predefined group of people that meet specific criterias needs to be established.
This cohort includes adults (patients whose age was > 15 years of age at the time of the ICU admission) and the patients first admission only to prevent confusion with readmissions.
To aide with me cohort selection, I used many SQL queries to filter the patients out based on these criterias

9 of 14

Selecting a Cohort

We first start by figuring out the patient’s birth date and the patient’s admission dates to the ICU.

Next, we figure out the patient’s first admission date so that we do not have multiple records of the same patient and so that we only have one age per patient.

10 of 14

Selecting a Cohort

We then find the age of the patient by finding the difference between their date of birth and the date of their first admission. We put these age groups into three categories: neonatal (< 15 years of age), adult (age range of 15–89) and >89 years of age.

11 of 14

Selecting a Cohort

Finally, we can use this information to categorize the patient’s that we want in our cohort for further tests. For the final cohort groupings, I will be using the male and female adults.

12 of 14

Tools Being Used

RDMS - PostgreSQL

Database Tool - pgAdmin 4

Hardware - Intel® Optimized AWS EC2 Instance

Languages - Python, SQL

Visualization Tools - Jupyter Notebook, Tableau

13 of 14

Future Plans

Use Super Learner Models to facilitate with and provide other information related to mortality rate predictions

Super Learner is a supervised learning algorithm that is designed to find the optimal combination from a set of prediction algorithms.

1 of 14

2 of 14

3 of 14

4 of 14

5 of 14

6 of 14

7 of 14

8 of 14

9 of 14

10 of 14

11 of 14

12 of 14

13 of 14

14 of 14