1 of 16

BDA 2021

Neural Language Modeling of Unstructured Clinical Notes for Automated Patient Phenotyping

Akshara Prabhakar*, Shidharth Srinivasan*, Sowmya Kamath

Healthcare Analytics & Language Engineering (HALE) Lab, Department of Information Technology,

National Institute of Technology Karnataka, Surathkal

o r

Presented by

Shidharth S

56th Annual Conference on Information Sciences and Systems

(CISS 2022)

Organized by

Princeton University

In Online mode

March 9-11, 2022

2 of 16

Outline

Introduction
Problem Statement
Previous Work
Modules
Class Imbalance
Experiments and Results
Conclusion
Future Work
References

3 of 16

Introduction

Electronic Health Records (EHRs) are a set of clinical data related to an individual patient’s medical history, storing vital information pertaining to patients’ primary care and healing processes.
Unstructured data comprises a significant part of structured EHRs and provide a rich source of patient-specific data like doctors' notes, nurses' notes, radiology reports, discharge summaries.
Intensive Care Units (ICUs) are limited-resource, expensive environments where agile and accurate decision-making is crucial.

4 of 16

Introduction

A phenotypic abnormality in medical settings is a deviation from normal human physiology, morphology, or behavior.
Discovering patient phenotypes is very helpful to determine how individual patients would respond to certain drugs and how they might react to different interventions.

5 of 16

Problem Statement

Neural Language Modeling of Unstructured Clinical Notes for Automated Patient Phenotyping

Contributions

transformer based architectures for phenotyping
exploring attentional mechanisms with various note types
extensive analysis on MIMIC-III

6 of 16

Previous Works

Work	Data source	Approach
Gehrmann et al.	Discharge summary notes	Convolution neural network to identify patient phenotypes
ws-CNN, Yang et al.	Discharge summary notes	CNN with three different filter sizes with a combination of word and sentence level embeddings
ClinicalBERT, Huang et al.	All notes from MIMIC III	Pre-trained BERT on clinical notes and fine-tune the network for predicting hospital readmissions at various time points.
ClinicalBERT based fmean, Mulyar et al.	N2C2 2008	Divide the entire clinical document into chunks and use various approaches to combine important information from them using the sequence of CLS tokens, with the best performance obtained when a mean is taken

8 of 16

Modules

Encoder Module

BERT Encoder
ClinicalBERT Encoder

Cross & Self Attention Module

9 of 16

Class Imbalance

We add class-weights to handle the high class imbalance. Let a, b be the class labels and P_a, P_b their occurrences. Then the weights used are:

class weight_a = P_b/(P_b + P_a)

class weight_b = P_a/(P_b + P_a)

10 of 16

Experiments and Results

11 of 16

Experiments and Results

Most existing works utilize only a subset of the patients from the MIMIC-III dataset termed ``frequent flyers" >= 3 ICU visits in a year, as introduced by Gehrmann et al. It contains the discharge summary of 1,610 patients. We refer to this subset as D_sub.

12 of 16

Experiments and Results

13 of 16

Conclusion

Built on a pre-trained model trained on clinical notes to obtain clinical encodings
Encodings are passed through a self/cross-attention layer and the outputs of this attention layer are combined with the encodings generated before to reinforce past learning
Frequency-based weighting to handle imbalance

14 of 16

Future Work

Utilizing additional information from the structured data such as ICD-9 codes

15 of 16

References

K. Huang, J. Altosaar, and R. Ranganath, “Clinicalbert: Modeling clinical notes and predicting hospital readmission,” 2020.
A. Mulyar, E. Schumacher, M. Rouhizadeh, and M. Dredze, “Phenotyping of clinical notes with improved document classification models using contextualized neural language models,” ArXiv, vol. abs/1910.13664, 2019.
S. Gehrmann, F. Dernon courtet al., “Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives,”PLOS ONE, vol. 13, pp. 1–19, 02 2018.
Z. Yang, M. Dehmer, O. P. Yli-Harja, and F. Emmert-Streib, “Combining deep learning with token selection for patient phenotyping from electronic health records,” Scientific Reports, vol. 10, 2020.

1 of 16

2 of 16

3 of 16

4 of 16

5 of 16

6 of 16

7 of 16

8 of 16

9 of 16

10 of 16

11 of 16

12 of 16

13 of 16

14 of 16

15 of 16

16 of 16