1 of 16

BDA 2021

1

Neural Language Modeling of Unstructured Clinical Notes for Automated Patient Phenotyping

Akshara Prabhakar*, Shidharth Srinivasan*, Sowmya Kamath

Healthcare Analytics & Language Engineering (HALE) Lab, Department of Information Technology,

National Institute of Technology Karnataka, Surathkal

o r

Presented by

Shidharth S

56th Annual Conference on Information Sciences and Systems

(CISS 2022)

Organized by

Princeton University

In Online mode

March 9-11, 2022

2 of 16

Outline

  • Introduction
  • Problem Statement
  • Previous Work
  • Modules
  • Class Imbalance
  • Experiments and Results
  • Conclusion
  • Future Work
  • References

2

3 of 16

Introduction

  • Electronic Health Records (EHRs) are a set of clinical data related to an individual patient’s medical history, storing vital information pertaining to patients’ primary care and healing processes.
  • Unstructured data comprises a significant part of structured EHRs and provide a rich source of patient-specific data like doctors' notes, nurses' notes, radiology reports, discharge summaries.
  • Intensive Care Units (ICUs) are limited-resource, expensive environments where agile and accurate decision-making is crucial.

3

4 of 16

Introduction

  • A phenotypic abnormality in medical settings is a deviation from normal human physiology, morphology, or behavior.
  • Discovering patient phenotypes is very helpful to determine how individual patients would respond to certain drugs and how they might react to different interventions.

4

5 of 16

Problem Statement

Neural Language Modeling of Unstructured Clinical Notes for Automated Patient Phenotyping

Contributions

  • transformer based architectures for phenotyping
  • exploring attentional mechanisms with various note types
  • extensive analysis on MIMIC-III

5

6 of 16

Previous Works

6

Work

Data source

Approach

Gehrmann et al.

Discharge summary notes

Convolution neural network to identify patient phenotypes

ws-CNN, Yang et al.

Discharge summary notes

CNN with three different filter sizes with a combination of word and sentence level embeddings

ClinicalBERT, Huang et al.

All notes from MIMIC III

Pre-trained BERT on clinical notes and fine-tune the network for predicting hospital readmissions at various time points.

ClinicalBERT based fmean, Mulyar et al.

N2C2 2008

Divide the entire clinical document into chunks and use various approaches to combine important information from them using the sequence of CLS tokens, with the best performance obtained when a mean is taken

7 of 16

7

8 of 16

Modules

Encoder Module

  • BERT Encoder
  • ClinicalBERT Encoder

Cross & Self Attention Module

8

9 of 16

Class Imbalance

  • We add class-weights to handle the high class imbalance. Let a, b be the class labels and Pa, Pb their occurrences. Then the weights used are:

class weighta = Pb/(Pb + Pa)

class weightb = Pa/(Pb + Pa)

9

10 of 16

Experiments and Results

10

11 of 16

Experiments and Results

11

Most existing works utilize only a subset of the patients from the MIMIC-III dataset termed ``frequent flyers" >= 3 ICU visits in a year, as introduced by Gehrmann et al. It contains the discharge summary of 1,610 patients. We refer to this subset as D_sub.

12 of 16

Experiments and Results

12

13 of 16

Conclusion

  • Built on a pre-trained model trained on clinical notes to obtain clinical encodings
  • Encodings are passed through a self/cross-attention layer and the outputs of this attention layer are combined with the encodings generated before to reinforce past learning
  • Frequency-based weighting to handle imbalance

13

14 of 16

Future Work

  • Utilizing additional information from the structured data such as ICD-9 codes

14

15 of 16

References

  1. K. Huang, J. Altosaar, and R. Ranganath, “Clinicalbert: Modeling clinical notes and predicting hospital readmission,” 2020.
  2. A. Mulyar, E. Schumacher, M. Rouhizadeh, and M. Dredze, “Phenotyping of clinical notes with improved document classification models using contextualized neural language models,” ArXiv, vol. abs/1910.13664, 2019.
  3. S. Gehrmann, F. Dernon courtet al., “Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives,”PLOS ONE, vol. 13, pp. 1–19, 02 2018.
  4. Z. Yang, M. Dehmer, O. P. Yli-Harja, and F. Emmert-Streib, “Combining deep learning with token selection for patient phenotyping from electronic health records,” Scientific Reports, vol. 10, 2020.

15

16 of 16

Thank you

Questions?

16