1 of 29

Fighting Bias with Bias

Challenges and Opportunities for Artificial Intelligence in Healthcare

Keith Harrigian

Johns Hopkins University

2 of 29

About Me

  • PhD Candidate in Computer Science at Johns Hopkins University
  • Research Areas
    • Natural Language Processing (NLP) for healthcare
    • Robustness, domain adaptation, and generalization
  • Other Pursuits
    • Data science at Netflix, Unforged, Warner Media, and True Fit
    • Behavioral neuroscience research (goal-oriented human movement)

3 of 29

In the realm of healthcare, artificial intelligence serves as a powerful antidote to bias, paving the way for a future where every individual receives unbiased and equal treatment.

– ChatGPT

4 of 29

Transformative AI is Here: Now What?

Rapid Progress of AI

  • Improved modeling architectures
  • Improved computational resources

Proceed With Caution

  • Endless opportunities to leverage AI in the fight against healthcare disparities
  • Awareness of limitations matters

“Can LLMs like GPT-4 outperform traditional AI tools in dementia diagnosis? Maybe, but not today.” Wang et al. arXiv. 2023.

5 of 29

Agenda

Review

AI, Bias, and

Healthcare

Case Study

Characterizing Stigmatizing Language in Medical Records

Open Dialogue

Bringing AI to the

Alzheimer’s Association

6 of 29

Review

AI, Bias, and Healthcare

7 of 29

Terminology

Systematic error in the outcome of a study due to dataset curation or modeling decisions

Examples

  • A dataset is not representative of the population it intends to study
  • A model does not properly characterize the behavior of its target population

Human-leveled prejudices and predispositions regarding groups, attributes, or circumstances

Examples

  • A dataset of transplant decisions made using knowledge of a patient’s income or race
  • A language model which disproportionately associates high-paying jobs with men

Statistical Bias

Social Bias

8 of 29

Sources of Bias

  • “Garbage In, Garbage Out”
  • Bias in AI is inevitable
    • Dynamic standards
    • New domains
  • An awareness of system shortcomings goes a long way

“Biases in AI Systems.” Srinivasan and Chander. Communications of the ACM. 2021.

9 of 29

Distribution Shift

What Happens

  • Data distributions can change between training a model and deploying it
  • Types of shift
    • Prior Shift: p(y) != p(y’)
    • Covariate Shift: p(x) != p(x’)
    • Concept Shift: p(y|x) != p(y’|x’)

Possible Solutions

  • Domain adaptation (requires target data)
  • Domain generalization (compromise within-domain performance)

Example: Language models trained on out-of-distribution data require adaptation to their target distribution.

“An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping.” Harrigian et al. Under Review. 2023.

10 of 29

Distribution Shift

Example: Words which were frequently used by individuals with depression started to be used by the general population after the beginning of COVID-19 to reflect pandemic-specific phenomena.

Term

2019 Embedding Neighborhood

2020 Embedding Neighborhood

Panic

Emotion (i.e., Fear)

rage, meltdown, anxiety, anger, barrage, migraine, phobia, outrage, manic, rush, asthma

Panic Buying, Misinformation

hysteria, chaos, fear, misinformation, confusion, frenzy, paranoia, mayhem, insanity, fearmongering

Cuts

Physical

cut, jumps, runs, cutting, pulls, moves, bounces, falls, turns, burns, drags, dips, breaks, bursts, rips, goes, bumps

Economic

cut, cutting, subsidies, budgets, deductions, revenues, checks, payments, breaks, deals, figures, loans, deposits, gains

Isolated

Feeling Detached

unpleasant, unstable, detached, unsafe, populated, invasive, unknown, confined, endangered, absent, vulnerable

Quarantine

quarantined, isolating, separated, enclosed, insulated, infectious, confined, active, populated, autonomous, vulnerable, detached

Strain

Discomfort/Pressure

inflammation, deficiency, dose, stress, pressure, calcium, medication, concentration, tissue, nausea, receptors, doses

Virus

disease, illness, infections, symptom, mutation,

virus, outbreak, pneumonia, infection, strains, influenza, epidemic

Vulnerable

Emotion

susceptible, dangerous, prone, unstable, aggressive, hostile, disruptive, detrimental, receptive, fragile, damaging

At-risk Populations

susceptible, dangerous, immunocompromised, infectious, isolating, elderly, disadvantaged, contagious, tolerant, likely, isolated

“The Problem of Semantic Shift in Longitudinal Monitoring of Social Media.” Harrigian et al. WebSci. 2022.

11 of 29

Group Imbalance

What Happens

  • Traditional machine learning models are trained to minimize average predictive error within their training dataset
  • If a training dataset is made up of multiple groups, the model is encouraged to do better on the larger groups

Possible Solutions

  • Distributionally Robust Optimization
  • Multi-Task Learning
  • Resampling

Example: A Logistic Regression classifier trained using ERM compromises minority group performance in favor of increasing majority group performance

12 of 29

Spurious Correlations

What Happens

  • Machine learning models tend to prefer simpler solutions over more complex alternatives
  • Non-causal correlations can be used erroneously as “shortcuts”

Possible Solutions

  • Causally-informed Models
  • Adversarial Learning

Hospital Bed

Vitals

Mortality

Example: Models will learn non-causal relationships between spurious (unstable) attributes and outcomes.

A

X

Y

13 of 29

The State of AI Bias Research

Defensive Tactics

Offensive Tactics

Improved Health Equity

Measure, identify, and protect against social and statistical bias in algorithmic healthcare tools

Measure, identify, and address instances of social bias in our healthcare system

14 of 29

Case Study

Characterizing Stigmatizing Language in Medical Records

15 of 29

Collaborators

Aya Zirikly

Brant Chee

Yahan Li

Mark Dredze

Anne R. Links

Alya Ahamad

Somnath Saha

Mary Catherine Beach

16 of 29

Problem Context

Black patients are significantly more likely than white patients to experience discrimination in the healthcare system (12.3% vs. 2.3%)

Patients who experience discrimination have:

  • Lower levels of adherence to treatment plans
  • Lower trust in healthcare providers
  • Increased likelihood to delay care or avoid chronic treatment screening

Healthcare providers who read notes containing stigmatizing language are more likely to formulate a less aggressive treatment plan

21st Century Cures Act mandates EHRs are readily available to all patients

17 of 29

Stigmatizing Language

Stigmatizing language assigns negative labels, stereotypes, and judgment to certain groups of people.

Often recognized in discussion regarding mental health and addiction

  • “Addict”
  • “Substance Abuse”
  • “Crazy”
  • “Junkie”

More generally, stigmatizing language reflects an implicit bias

  • Often expressed unconsciously
  • In the EHR, more commonly covert

18 of 29

Stigmatizing Language Taxonomy

Class

Definition

Examples

Disbelief

Insinuates doubt about a patient’s stated testimony.

adamant he doesn’t smoke;

claims to see a therapist

Difficult

Describes patient perspective as inflexible/difficult/entrenched, typically with respect to their intentions.

insists on being admitted; adamantly opposed to limiting fruit intake

Exclude

Word/phrase is not used to characterize the patient or describe the patient’s behavior; may refer to medical condition or treatment or to another person or context.

patient’s friend insisted she go to the hospital; test claims submitted to insurance

Task: Credibility and Obstinacy

19 of 29

Stigmatizing Language Taxonomy

Class

Definition

Examples

Negative

Patient not, unlikely to, or questionably following medical advice

adherence to therapeutic medication is unclear; mother declines vaccines; struggles with medication and follow-up compliance

Neutral

Not used to describe whether the patient is not following medical advice or rejecting treatment; often used to describe generically some future plan involving a hypothetical.

discussed medication compliance; school refuses to provide adequate accommodations; feels that her parents’ health has declined

Positive

Patient following medical advice.

continues to be compliant with aspirin regimen; reports excellent adherence

Task: Compliance

20 of 29

Stigmatizing Language Taxonomy

Class

Definition

Examples

Negative

Patient’s demeanor cast in a negative light; insinuates the patients is not being forthright

concern for secondary gain; unwilling to meet with case manager

Neutral

Negation of negative descriptors; insinuates the patient was expected to have a negative demeanor.

not combative or belligerent; dad seems angry with patient at times

Positive

Patient’s demeanor or behavior is described in a positive light; patient is easy to interact with.

lovely 80 year old woman; well-groomed and holds good eye contact

Exclude

Patient self-description or description of another individual.

does not want providers to think she’s malingering; reports feeling angry

Task: Descriptors

21 of 29

Overview of System Structure

Stigma

Labels

Machine Learning Classifier

Anchor

Extraction

Clinical

Notes

“Despite my best advice, the patient remains adamant about leaving the hospital today. Social services is aware of the situation.”

adamant

Disbelief

Exclude

Difficult

my best advice the patient remains adamant about leaving the hospital today social

22 of 29

Data

Johns Hopkins University (Private)

English-language progress notes

5 clinical specialties are represented – internal medicine, emergency medicine, pediatrics, OB-GYN, and general surgery (Baltimore, MD)

5,201 labeled instances

MIMIC-IV (Public)

De-identified, English discharge notes

Patients admitted to emergency department or an intensive care unit at Beth Israel Deaconess Medical Center (Boston, MA)

5,043 labeled instances

23 of 29

Model Performance and Keyword Grounding Limitation

Figure 2: Projection of embeddings for a subset of keywords. Labels cluster globally, but keywords cluster locally.

Figure 1: Model accuracy in the Credibility task. BERT models maximize performance at cost of interpretability.

Exclude

Negative

Neutral

24 of 29

Domain Transfer Performance

What happened?

  • MIMIC frequently contains references to a patient’s family, not the patient (ICU-related shift)
  • MIMIC contains more psychiatry exams in which the patient describes their mental wellbeing in a negative manner
  • The distribution of labels conditioned on each keyword changed between datasets

Figure 3: Macro F1 Score when training and testing on different distributions. There exists a consistent loss in performance when transferring between datasets.

25 of 29

Recap of Biases

System Design

  • Reliance on keywords to ground model predictions limits generalization to rare and/or new forms of stigmatizing language

Sample Selection

  • The patient population (demographics, clinical specialty, etc.) on which the model was trained dictates how it will perform at test time

26 of 29

The Opportunities Ahead

  • Document prevalence of stigmatizing language amongst different patient populations
  • How do changes in medical education curriculum regarding bias manifest in clinical notes?
  • Provider-specific “report cards” to sunshine implicit bias
  • “Autocorrect” for the EHR and doctor-to-patient messaging systems
  • Augmented training objectives for clinical language models

“… there is a suspicion that the patient is not adhering to their medication regimen consistently.”

“Characterization of Stigmatizing Language in Medical Records” Harrigian et al. ACL. 2023.

27 of 29

Open Dialogue

Bringing AI to the

Alzheimer’s Association

28 of 29

Areas of Discussion

  • Interpretability: What is the model doing?
  • Benchmarking: Will this model work for our population?
  • Data Sharing: How do we safely facilitate research?
  • Adversarial Data Analysis: Is our data biased?
  • Regulation: What does the future look like?
  • Emerging Research: What’s on the association’s docket?

29 of 29

Thank you

Email: kharrigian@jhu.edu

Learn More: kharrigian.github.io