1 of 24

Session 2: Longitudinal Trajectory Analysis

2 of 24

Open a browser and visit:

https://healthdatascience.awsapps.com/start/

Enter the email you used to register for the workshop as your username.

Check your email for a verification code and then follow the instructions to create your account password. After that you should be signed in.

myemail@mydomain.com

  • Look for the email also in your junk folder

On your dashboard under Applications, click on Amazon SageMaker Studio link to open SageMaker.

3 of 24

Review: The Arivale Dataset

Price et al, Nat Biotechnol, 2018

4 of 24

Review: Frailty Index is a Fraction of Health Defects

Female

Male

Baseline Age

Baseline Age

  • Self-Report FI (35 items)
    • Disease (15 items)
    • Activity (9 items)
    • Satisfaction (6 items)
    • Medication (3 items)
    • Digestion (2 items)
  • Lab FI (34 items)
    • Blood test items (29 items)
    • Blood pressure items (5 items)
  • Combined FI (69 items)
    • The combination of the above two

5 of 24

Session 2.1: DIABLO Analysis

  • Session 1
    • Trends in data (PCA)
    • Multi-omic correlations
    • Identified clusters
    • Cluster eigenvalues
  • Session 2.1
    • Trends in the outcome
    • Multi-omic model
    • Cross-validation

PC1

DIABLO (PLS-DA)

6 of 24

DIABLO Overview

  1. Data preparation
    1. Outcome: self-reported frailty index
    2. Baseline data only
    3. Proteomics, metabolomics, and lab tests
  2. sPLS-DA
    • Proteomics
    • Metabolomics
    • Lab tests
  3. Block sPLS-DA (all three ‘omics)
    • Sparsity parameter optimization
    • Model fitting

7 of 24

Frailty Index (FI): fraction of health deficits

  • Self-Report FI (35 items) – Baseline questionnaire (once)
    • Disease (15 items)
    • Activity (9 items)
    • Satisfaction (6 items)
    • Medication (3 items)
    • Digestion (2 items)
  • Lab FI (34 items) – Longitudinal, every 6 months
    • Blood test items (29 items)
    • Blood pressure items (5 items)
  • Combined FI (69 items) Session 1 (baseline labs)
    • The combination of the above two
  • Comparison (Spearman’s rank correlation of quintiles)
    • Self x Lab 0.367
    • Self x Combined 0.730
    • Lab x Combined 0.843

8 of 24

Frailty Index (FI): fraction of health deficits

  • Self-Report FI (35 items) Baseline questionnaire (once) Session 2.1
    • Disease (15 items)
    • Activity (9 items)
    • Satisfaction (6 items)
    • Medication (3 items)
    • Digestion (2 items)
  • Lab FI (34 items) Longitudinal, every 6 months Session 2.2
    • Blood test items (29 items)
    • Blood pressure items (5 items)
  • Combined FI (69 items) Session 1 (baseline labs)
    • The combination of the above two
  • Comparison (Spearman’s rank correlation of quintiles)
    • Self x Lab 0.367
    • Self x Combined 0.730
    • Lab x Combined 0.843

To the Notebook!

9 of 24

PCA of Baseline ‘Omics Data

Proteomics (274) Metabolomics (703) Clinical Tests (47)

Color and shape: Frailty Index Quintile (e.g. Q5 are the most frail 20%)

10 of 24

sPLS-DA of Baseline ‘Omics Data

To the Notebook!

11 of 24

sPLS-DA of Baseline ‘Omics Data

Proteomics (274) Metabolomics (703) Clinical Tests (47)

12 of 24

Block sPLS-DA

To the Notebook!

13 of 24

Block sPLS-DA

14 of 24

Visualizing DIABLO model components

15 of 24

DIABLO Model: Component 1

Proteins Lab Tests Metabolites

FABP4 0.930 INSULIN 0.689 hydroxyasparagine 0.522

LEP 0.347 HOMA-IR 0.533 N-stearoyl-sphinganine 0.438

LPIR 0.385 cortolone glucuronide 0.366

N-stearoyl-sphingosine 0.311

5-methylthioadenosine 0.275

1-carboxyethylphenylalanine 0.253

16 of 24

DIABLO Model: Component 2

Proteins Lab Tests Metabolites

AGRP -0.763 BUN/CREAT 0.561 DHEA-S -0.557

NT-proBNP 0.413 POTASSIUM -0.318 androstenediol(3b,17b)S2 -0.452

NPPB 0.354 ALBUMIN -0.453 pregnenediol-S -0.292

GDF15 0.242 GFR, MDRD -0.246 pregnenetriol-S2 -0.284

PROTEIN -0.232 pregnenediol-S2 -0.259

CREATININE -0.228 androstenediol(3a,17a)S -0.253

VIT D, 25-OH 0.227

EPA 0.124

17 of 24

Session 2.2: Longitudinal Data Analysis

18 of 24

Session 2.2: Longitudinal Data Analysis

  • Measurements repeated over time are often correlated, not independent
  • Appropriate analytical methods depend on…
    • Number of repetitions (as few as 2, as many as thousands, millions)
    • Data type (categorical, counts, money, continuous) and missingness
    • Synchronous (e.g. stock prices, EEG) vs asynchronous (e.g. blood tests) data collection
    • Handling of collection failures (e.g. recollection of one failed tube out of 5 in a blood draw)
    • Systems biology (‘omics) data can be expected to have a variety of data collection issues!
  • Potential goals
    • Predict future values
    • Identify unknown correlations among measured analytes (systems analysis)
    • Determine differences between groups (e.g. treated vs untreated patients; same treatment of different patient subgroups)
    • Identify causal relationships (causes precede effects in time).

19 of 24

Generalized Linear Mixed-Effects Models (GLMMs)

  • Generalized Linear Model (GLM): g(Y) ~ A + B + … + 𝝐
    • Response Y may be binary, categorical, or continuous
    • Link function g(x) maps Y to a continuous value
      • Must be invertible: Y ~ g-1(g(Y))
      • g(x) = x (Identity), = 1/x (inverse), = log(x), = x / (1+x) (logit), = Pr{ N(0,1) < x } (Probit)
    • Fixed effects ( A + B + … )
      • Explicit parameters
    • Error 𝝐 distribution may be any distribution in the Exponential family
      • Gaussian, Poisson, Binomial, Negative Binomial, Bernoulli, Exponential, Gamma
  • GLMM: g(Y) ~ A + B + … + (C + D + … | Grouping) + 𝝐
    • Random effects ( C + D + … | Grouping )
      • Identical within but independent across groups
      • Implicit parameters (no point estimates; average over a distribution)

20 of 24

Frailty Index increases with age

FI ~ t1 + t2 + t3 + Age + PC1 + PC2 + PC3 + PC4 + (1 + t1 | client)

— spline — — AIC —

674 F

425 M

21 of 24

Frailty Index can be lowered by lifestyle changes

t1

t2

t3

674 F

425 M

FI ~ t1 + t2 + t3 + Age + PC1 + PC2 + PC3 + PC4 + (1 + t1 | client)

— spline — — AIC —

22 of 24

Polygenic Risk Score for Longevity (dl90_cs_PRS)

Longevity PRS and Sex

23 of 24

Frailty, Age, and Longevity

Frailty and Longevity PRS

Frailty by Sex and Age

F

M

472

343

202

109

24 of 24

BREAK