1 of 35

Deep-Learning Analysis of Longitudinal Alzheimer’s Data

2 of 35

Today:

  1. Current Cognitive Status Classification
  2. Future Cognitive Status Prediction
  3. Influences of Sampling
  4. Model Attention Activation Analysis

3 of 35

Today:

  1. Current Cognitive Status Classification
  2. Future Cognitive Status Prediction
  3. Influences of Sampling
  4. Model Attention Activation Analysis

4 of 35

In-Office Neural Markers are Highly Sensitive for Control/aMCI Conversion

Ewers, M., Brendel, M., Rizk-Jackson, A., Rominger, A., Bartenstein, P., Schuff, N., & Weiner, M. W. (2014). Reduced FDG-PET brain metabolism and executive function predict clinical progression in elderly healthy subjects. NeuroImage: Clinical, 4, 45–52. https://doi.org/10.1016/j.nicl.2013.10.018

FDG-PET Only

67.4% specificity (80% sens.)

FDG-PET + Trail Marking B

80.7% specificity (80% sens.)

Just one extra feature!

5 of 35

NACC Dataset

  • Longitudinal neurodegenerative disease dataset
  • 45,100 patients, 180,000+ samples
  • Includes clinical diagnosis of cognitive status at time of visit

Fertile ground for automated, time series analysis

6 of 35

Clinician-Annotated Cognitive Status at First Time of Visit

NACC Neuropsychological Battery

  • MoCA
  • MMSE (and sub scores)
  • Boston Naming Test
  • Multilingual Naming Test
  • and others

In total: 103 numerical features

Importantly: no CDR!

7 of 35

Task 1: Predict Clinical Cognitive Status at Time-of-Visit

NACC Neuropsychological Battery

Control/MCI/Dementia

For each sample:

Classification

8 of 35

NACC Dataset Offers Even Better Results with a Few Features!

Lin, M., Gong, P., Yang, T., Ye, J., Albin, R. L., & Dodge, H. H. (2018). Big Data Analytical Approaches to the NACC Dataset: Aiding Preclinical Trial Enrichment. Alzheimer Disease & Associated Disorders, 32(1), 18–27. https://doi.org/10.1097/WAD.0000000000000228

SVM + 15 NACC Clinical Features

71.0% accuracy

15 features? Why?

9 of 35

Most Samples in the NACC Have A Lot of the Available Neuropsychological Features Not Reported

  • Over half of the available features are not reported (“-4”) across the battery
  • Which features are missing is inconsistent
  • Difficult to train a widely applicable model

10 of 35

Cognitive Data is Not Uniformly Collected

For instance—

The presence of one of:

Implies the likely absence of all of:

  • MMSE
  • Story Unit Recall (NACCMMSE)
  • Logical Memory IIA (LOGIMEM)
  • Benson Figure (UDSBENTC, UDSBENTD)
  • F-Word Generation (UDSVERFC, UDSVERFN, UDSVERNF)
  • L-Word Generation (UDSVERLC, UDSVERLR, UDSVERLN)
  • F+L-Word Generation (UDSVERTN,UDSVERTE, UDSVERTI)
  • MOCA (NACCMOCA, MOCA*)
  • Craft Story (CRAFT*)
  • Number Span Test (DIG*)
  • Multilingual Naming Test (MINT*)

This is bad.

11 of 35

Proposal: Relationship Between Available Features Can Be Learned

Kim, J., Nguyen, D., Min, S., Cho, S., Lee, M., Lee, H., & Hong, S. (2022). Pure transformers are powerful graph learners. Advances in Neural Information Processing Systems, 35, 14582-14595.

Goal: Represent the graphical relationship in the available features via Transformer.

Velicˇkovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2018). Graph Attention Networks. Proceedings of ICLR.

12 of 35

Mask attn. value to represent missing data

Epochs: 55

Optimizer: AdamW

LR: 1e-4

Weight Decay: 1e-5

13 of 35

Yes: Classical Transformer Performs Well in Current-Visit Cognitive Status Prediction

Even with sparse inputs with many, many missing features!

10-fold cross-validation

roughly 90 minutes/run RTX2080

Can we do better?

14 of 35

NACC Neural-Psychological Battery

  • MoCA
  • MMSE (and sub scores)
  • Boston Naming Test
  • Multilingual Naming Test
  • and others

In total: 103 numerical features

dosen’t really capture risk factors

NACC Patient History Features

  • Familial history of AD
  • Reported drug use
  • Tobacco and alcohol use
  • Heart and Circulatory Health
  • Comorbidities (Diabetes, Obesity, …)
  • Neurologoical Conditions (OCD, Anxiety, Bipolar, …)
  • Physical Exam

In total: 69 numerical features

Note: no functional behavior

15 of 35

Including health features creates even better model performance

10-fold cross-validation

roughly 90 minutes/run RTX2080

16 of 35

Including health features creates even better model performance

10-fold cross-validation

roughly 90 minutes/run RTX2080

17 of 35

Today:

  1. Current Cognitive Status Classification
  2. Future Cognitive Status Prediction
  3. Influences of Sampling
  4. Model Attention Activation Analysis

18 of 35

Today:

  1. Current Cognitive Status Classification
  2. Future Cognitive Status Prediction
  3. Influences of Sampling
  4. Model Attention Activation Analysis

19 of 35

Lots of Longitudinal Data in the NACC!

Time Between Any Two Samples is Usually a Year+ Apart!

20 of 35

NACC Dataset Offers Even Better Results with a Few Features!

James, C., Ranson, J. M., Everson, R., & Llewellyn, D. J. (2021). Performance of Machine Learning Algorithms for Predicting Progression to Dementia in Memory Clinic Patients. JAMA Network Open, 4(12), e2136553. https://doi.org/10.1001/jamanetworkopen.2021.36553

Gradient Boosting + 258 NACC Clinical Features

92% accuracy, <2 years dementia

“Of these variables, 239 (93%) were missing for at least 1 participant, and all participants had at least 1 variable missing.”

21 of 35

Task 2: 1-3 Year Prognosis Prediction

NACC Neuropsychological Battery

Control/MCI/Dementia status in 1-3 years

For each sample:

Classification

22 of 35

Task 2: 1-3 Year Prognosis Prediction

NACC Neuropychological Battery

Control/MCI/Dementia status in 1-3 years

For each sample:

Classification

Number of sample pairs exactly 1-3 years apart:

Just fine-tune the previous model!

A lot less data than current cognitive status

9164 samples

23 of 35

Directly Fine-Tuning Current-Visit Model for 1-3 Year Prognosis Task Yields Promising Results

Current Visit (Base Model)

1-3 Year Prognosis (Fine Tuning)

10-fold cross-validation

roughly 90 minutes/run RTX2080

24 of 35

Today:

  1. Current Cognitive Status Classification
  2. Future Cognitive Status Prediction
  3. Influences of Sampling
  4. Model Attention Activation Analysis

25 of 35

Today:

  1. Current Cognitive Status Classification
  2. Future Cognitive Status Prediction
  3. Influences of Sampling
  4. Model Attention Activation Analysis

26 of 35

Current Cognitive Status Prediction:

While current status prediction benefits from health info, effects are not seen as prevalently in future prediction.

Future Cognitive Status Prediction:

Combined Feature Set Consistently Better

27 of 35

Today:

  1. Current Cognitive Status Classification
  2. Future Cognitive Status Prediction
  3. Influences of Sampling
  4. Model Attention Activation Analysis

28 of 35

Today:

  1. Current Cognitive Status Classification
  2. Future Cognitive Status Prediction
  3. Influences of Sampling
  4. Model Attention Activation Analysis

29 of 35

Alammar, J. (2018). The Illustrated Transformer. Retrieved October 5, 2023, from http://jalammar.github.io/illustrated-transformer/

What can these connections tell us?

30 of 35

Top-most attended-to features:

(count of occurrence in top-10 most attended-to features)

???

MEMTIME: Time elapsed since Logical Memory IA

MEMTIME

215

TRAILALI

202

QUITSMOK

192

UDSVERTN

177

TRAILBLI

170

BOSTON

168

WAIS

167

NACCMMSE

154

MINTTOTS

148

MINTTOTW

141

TRAILB

132

CRAFTDTI

125

TRAILA

118

NACCMOCA

118

MOCATOTS

112

ANIMALS

73

HRATE

62

UDSBENTC

54

CRAFTVRS

20

LOGIMEM

7

CRAFTDVR

2

MEMUNITS

2

UDSVERFC

1

31 of 35

Montgomery, V., Harris, K., Stabler, A., & Lu, L. H. (2017). Effects of Delay Duration on the WMS Logical Memory Performance of Older Adults with Probable Alzheimer’s Disease, Probable Vascular Dementia, and Normal Cognition. Archives of Clinical Neuropsychology, 32(3), 375–380. https://doi.org/10.1093/arclin/acx005

32 of 35

Model correctly recognized, and attended to, internal data variation

33 of 35

Today:

  1. Current Cognitive Status Classification
  2. Future Cognitive Status Prediction
  3. Influences of Sampling
  4. Model Attention Activation Analysis

34 of 35

Today:

  1. Masked transformer encoders can achieve state-of-the-art machine cognitive status prediction on NACC neuropsychological battery despite sparse feature signals
  2. Fine-tuning trained current-visit diagnostic models on longitudinal data can yield good results for 3-year prognosis prediction
  3. While patient history and lifestyle information significantly aids current-visit diagnostics, its inclusion doesn’t change the accuracy of 3-year prognosis prediction dramatically
  4. Analysis of first-layer attention activations yields insights regarding features to investigate

35 of 35

Deep-Learning Analysis of Longitudinal Alzheimer’s Data

Thank you.