1 of 45

Hae Kyung Im, PhD

Predicting Disease Risk

Polygenic Risk Scores

April 18, 2022

2 of 45

Polygenic Additive Model

2

3 of 45

Polygenic Additive Model

3

4 of 45

Estimating Parameters of Polygenic Additive Model

4

Can we fit all SNPs at the same time?

Why can’t we estimate betas by least squares?

5 of 45

Estimating Parameters of Polygenic Additive Model

5

Can we fit all SNPs at the same time?

Why can’t we estimate betas by least squares?

Too many parameters and too few observations

6 of 45

Rule of Thumb

6

10 data points per parameter

In a pinch, at least 5 data points per parameter

7 of 45

Solution: Random Effects Reduces # Parameters

7

Can we fit all SNPs at the same time?

Why can’t we estimate betas by least squares?

Too many parameters and too few observations

Assume

and estimate just the

Solution

8 of 45

Mixed Effects Modeling

8

are random

9 of 45

Connection to EMMAX Used To Account for Population Structure?

9

Y

Recall EMMAX, mixed effects approach to adjust for population structure and relatedness

    • Y = xtest · βtest + u + ε
    • u ~ N(0,σ2 ·K)

10 of 45

Review: norm of vectors

11 of 45

L2 Norm of Vector

11

12 of 45

In a GWAS we minimize the L2 norm of the error term

12

13 of 45

In a GWAS we minimize the L2 norm of the error term

13

14 of 45

Which one do you think is the L1 norm?

14

15 of 45

Prediction of Complex Traits

16 of 45

Simple Polygenic Risk Score

16

Nature 2009

Just use GWAS effect sizes

17 of 45

Tricks to Deal with Too Many Parameters

  1. Model βs as random effects
  2. Use GWAS effects (fit one SNP at a time, ignoring the rest)
  3. Penalized likelihood is another way to deal with too many parameters
    • also referred to as regularization

17

18 of 45

Best Linear Unbiased Prediction (BLUP)/Ridge

18

Penalized regression

Ridge

19 of 45

LASSO/Elastic Net Prediction

19

Penalized regression

LASSO

Elastic Net

20 of 45

Whole Genome Prediction Approaches

20

21 of 45

Advantages of Simple Polygenic Scores

21

Main advantage easy to get or calculate, scalable

GWAS results publicly available

vs. multivariate approaches (ridge, elastic net, BSLMM) need individual data

although some fine-mapping methods allow inferring

multivariate regression results from summary statistics

22 of 45

Polygenic Scores Can Be Improved Using LD Information

  • Pruning and thresholding (PRSice)
  • Lasso-sum (Mak et al)
  • LD-Pred (Vilhjálmsson)
  • RSS (Zhu)
  • S-BayesR (Lloyd-Jones)
  • PRS-CS

22

Zhu, X., & Stephens, M. (2017). Bayesian large-scale multiple regression with summary statistics from genome-wide association studies. AOAS

Vilhjálmsson et al. (2015). Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. AJHG

Mak,et al (2017). Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology, 41(6), 469–480.

Luke R. Lloyd-Jones (2019). Improved polygenic prediction by Bayesian multiple regression on summary statistics. BioRxiv.

Ge, T., Chen, CY., Ni, Y. et al. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019). https://doi.org/10.1038/s41467-019-09718-5

23 of 45

Importance of Having Good LD Reference Data

23

All the methods listed in the previous page rely on having good LD reference data.

With increasing sample sizes, methods that use summary statistics and infer results similar to having individual level data are critical.

- Summary statistics from GWAS are being widely shared.

- LD reference from the same study is not, this is something that needs to change

24 of 45

What about Deep Learning?

24

"In all, over the range of traits evaluated in this study, CNN performance was competitive to linear models, but we did not find any case where DL outperformed the linear model by a sizable margin."

25 of 45

DNA Sequencing to Gene Expression

25

26 of 45

Clinical Utility of Genetic Predictions

27 of 45

Genomic Prediction of Height in UK Biobank

27

Lello et al (2018). Accurate Genomic Prediction of Human Height. Genetics

28 of 45

Prevalence of Coronary Artery Disease Increases with PRS

28

Khera et al (2018) Nature Genetics

29 of 45

Prevalence of Type 2 Diabetes Increases with PRS

29

Mahajan et al (2019) Nature Genetics

30 of 45

Breast Cancer: PRS is Predictive of Risk

30

Mavaddat N et al Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes. Am J Hum Genet. 2019 Jan 3;104(1):21-34. doi: 10.1016/j.ajhg.2018.11.002. Epub 2018 Dec 13. PMID: 30554720; PMCID: PMC6323553.

31 of 45

PRS + Family History Improves Risk Prediction

31

Kachuri, L., Graff, R.E., Smith-Byrne, K. et al. Pan-cancer analysis demonstrates that integrating polygenic risk scores with modifiable risk factors improves risk prediction. Nat Commun 11, 6084 (2020). https://doi.org/10.1038/s41467-020-19600-4

Fig. 3: Predicted 5-year absolute risk trajectories across strata defined by family history and percentiles of the polygenic risk score (PRS) distribution.

32 of 45

33 of 45

Do PRS work for everyone?

34 of 45

Portability of Prediction Across Ancestries

34

Martin et al 2019 NG https://www.nature.com/articles/s41588-019-0379-x

35 of 45

Ancestry Composition of Current GWAS

35

Martin et al 2019 NG https://www.nature.com/articles/s41588-019-0379-x

36 of 45

Allele Frequency and LD Differ Across Ancestries

36

Martin et al 2019 NG https://www.nature.com/articles/s41588-019-0379-x

37 of 45

PRS Does Not Transfer Well Across Populations

37

Martin et al 2019 NG https://www.nature.com/articles/s41588-019-0379-x

38 of 45

Investments in more diverse samples are being made

39 of 45

Bioethical Issues with Embryo Screening

40 of 45

41 of 45

Embryo Selection Active Area of Research

41

42 of 45

Embryo Selection Active Area of Research

42

43 of 45

Orchid Offers Preconception Testing

43

https://twitter.com/alexia/status/1380236427485667335

Are they helping eliminate disease or implementing eugenics?

You can check out Lior Pachter's opinionated take on this subject

https://liorpachter.wordpress.com/2021/04/12/the-amoral-nonsense-of-orchids-embryo-selection/

44 of 45

Effect on Schizophrenia Risk Selection

44

45 of 45

Title Text

45

Fig. 1: Hazard ratios (HR) per one standard deviation (SD) increase in the standardized polygenic risk score (PRS).