1 of 28

PhenoAge & PCage

Saket Choudhary

saketc@iitb.ac.in

Computational Multi-omics of Ageing

DH 603

Lecture 07 || Wednesday, 2nd April 2025

2 of 28

PhenoAge

3 of 28

3

dfdf

PhenoAge: Predictor for ageing and lifespan

4 of 28

dfdf

From last class: GrimAge uses plasma protein and smoking years as surrogates

5 of 28

dfdf

PhenoAge Steps

  • Step1 : Use clinical data to estimate “phenotypic age” from “National Health and Nutrition Examination Survey (NHANES)”
  • Step 2 : Elastic-net regression model to predict PhenoAge from CpG markers

  • Step 3: Understand the biology of the 513 CpGs

6 of 28

https://www.cdc.gov/nchs/nhanes/index.html

What is NHANES?

https://www.cdc.gov/nchs/nhanes/index.html

7 of 28

dfdf

What is NHANES

  • Nationally-representative sample, with over 23 years of mortality follow-up with biomarker data
  • Fit a regression model to select variables of associated with phenotypic score (regress hazard of mortality on 42 clinical biomarkers)

8 of 28

dfdf

PhenoAge can predict mortality

9 of 28

dfdf

PhenoAge can also predict chronological age

10 of 28

PCAge

11 of 28

11

dfdf

PCAge

12 of 28

dfdf

Intraclass correlation coefficient as a measure of reproducibility

13 of 28

dfdf

CpG signal shows low reproducibility

14 of 28

dfdf

Default clocks lack reproducibility

15 of 28

dfdf

PCage uses PCA before running elastic-net regression

16 of 28

dfdf

Epigenetic clocks trained from principal components are highly reliable

17 of 28

dfdf

1D Principal Component Analysis (PCA)

Goal: Reduce the dimensionality of data by projecting the data into lower dimensional space such that the reconstruction error is minimized

18 of 28

dfdf

Minimizing reconstruction error = Maximizing variance

Original point

Projected

point

Reconstruction error

Variance

How can we determine the direction of maximal variance?

19 of 28

dfdf

What does covariance matrix tell you about the data?

Data aligned with axes and covariance is diagonal

Data oblique wrt axies and covariance is diagonal

Gaussian cloud

20 of 28

dfdf

Covariance matrix captures the general extent of data

Different distributions with same covariance matrix

21 of 28

dfdf

PCA rotates the axes to diagonalize the covariance matrix

22 of 28

Singular value decomposition

  • Input: A matrix
  • Output: a set of numbers called singular values and two collections of vectors: a set of right singular vectors and another set of left singular vectors.

23 of 28

Geometry of Singular value decomposition

  • A given matrix M transforms the unit vectors into an ellipse
  • This can be imagined as
    • 1. Performing rotation of the unit vectors by V
    • 2. Scaling these vectors by scaling factors (singular values of the matrix M)
    • 3. Performing another rotation by U

24 of 28

Singular value decomposition

  • The matrix M is rectangular.�
  • There are three non-zero singular values.�
  • The rank of M is 3�
  • UUT = I and VVT = I where I is the unit vector

25 of 28

What are singular values?

  • A m x n matrix M can be thought of as mapping a vector x from Rn to Rm.
  • A unit sphere in Rn is mapped to an ellipsoid in Rm:����
  • The non-zero singular values of M are the lengths of the semi-axes of the ellipsoid.

26 of 28

Principal Component Analysis - The recipe

  • Start with a data matrix M.
  • Center M by subtracting the column means (each column is a feature)
  • Perform SVD of M → M = UΣVT
    • U and V are orthonormal
    • Σ is a diagonal matrix of singular values
    • V is made of eigenvectors that diagonalize the covariance matrix MTM.
  • Truncate Vk to retain the first k columns
    • Mk = UkΣVkT is a good low rank (k) approximation of M.
  • “Project” the original matrix M onto Vk: MVk
    • This projection has two properties:
      • It maximises the variance of projected points
      • It results in minimum reconstruction error if the original matrix is to be reconstructed

27 of 28

dfdf

PCA: the optimization

PC1

PC2

28 of 28

28

Questions?