Hae Kyung Im, PhD
Predicting Disease Risk
Polygenic Risk Scores
April 18, 2022
Polygenic Additive Model
2
Polygenic Additive Model
3
Estimating Parameters of Polygenic Additive Model
4
Can we fit all SNPs at the same time?
Why can’t we estimate betas by least squares?
Estimating Parameters of Polygenic Additive Model
5
Can we fit all SNPs at the same time?
Why can’t we estimate betas by least squares?
Too many parameters and too few observations
Rule of Thumb
6
10 data points per parameter
In a pinch, at least 5 data points per parameter
Solution: Random Effects Reduces # Parameters
7
Can we fit all SNPs at the same time?
Why can’t we estimate betas by least squares?
Too many parameters and too few observations
Assume
and estimate just the
Solution
Mixed Effects Modeling
8
are random
Connection to EMMAX Used To Account for Population Structure?
9
Y
Recall EMMAX, mixed effects approach to adjust for population structure and relatedness
Review: norm of vectors
L2 Norm of Vector
11
In a GWAS we minimize the L2 norm of the error term
12
In a GWAS we minimize the L2 norm of the error term
13
Which one do you think is the L1 norm?
14
Prediction of Complex Traits
Simple Polygenic Risk Score
16
Nature 2009
Just use GWAS effect sizes
Tricks to Deal with Too Many Parameters
17
Best Linear Unbiased Prediction (BLUP)/Ridge
18
Penalized regression
Ridge
LASSO/Elastic Net Prediction
19
Penalized regression
LASSO
Elastic Net
Whole Genome Prediction Approaches
20
Advantages of Simple Polygenic Scores
21
Main advantage easy to get or calculate, scalable
GWAS results publicly available
vs. multivariate approaches (ridge, elastic net, BSLMM) need individual data
although some fine-mapping methods allow inferring
multivariate regression results from summary statistics
Polygenic Scores Can Be Improved Using LD Information
22
Zhu, X., & Stephens, M. (2017). Bayesian large-scale multiple regression with summary statistics from genome-wide association studies. AOAS
Vilhjálmsson et al. (2015). Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. AJHG
Mak,et al (2017). Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology, 41(6), 469–480.
Luke R. Lloyd-Jones (2019). Improved polygenic prediction by Bayesian multiple regression on summary statistics. BioRxiv.
Ge, T., Chen, CY., Ni, Y. et al. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019). https://doi.org/10.1038/s41467-019-09718-5
Importance of Having Good LD Reference Data
23
All the methods listed in the previous page rely on having good LD reference data.
With increasing sample sizes, methods that use summary statistics and infer results similar to having individual level data are critical.
- Summary statistics from GWAS are being widely shared.
- LD reference from the same study is not, this is something that needs to change
What about Deep Learning?
24
"In all, over the range of traits evaluated in this study, CNN performance was competitive to linear models, but we did not find any case where DL outperformed the linear model by a sizable margin."
DNA Sequencing to Gene Expression
25
Clinical Utility of Genetic Predictions
Genomic Prediction of Height in UK Biobank
27
Lello et al (2018). Accurate Genomic Prediction of Human Height. Genetics
Prevalence of Coronary Artery Disease Increases with PRS
28
Khera et al (2018) Nature Genetics
Prevalence of Type 2 Diabetes Increases with PRS
29
Mahajan et al (2019) Nature Genetics
Breast Cancer: PRS is Predictive of Risk
30
Mavaddat N et al Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes. Am J Hum Genet. 2019 Jan 3;104(1):21-34. doi: 10.1016/j.ajhg.2018.11.002. Epub 2018 Dec 13. PMID: 30554720; PMCID: PMC6323553.
PRS + Family History Improves Risk Prediction
31
Kachuri, L., Graff, R.E., Smith-Byrne, K. et al. Pan-cancer analysis demonstrates that integrating polygenic risk scores with modifiable risk factors improves risk prediction. Nat Commun 11, 6084 (2020). https://doi.org/10.1038/s41467-020-19600-4
Fig. 3: Predicted 5-year absolute risk trajectories across strata defined by family history and percentiles of the polygenic risk score (PRS) distribution.
Do PRS work for everyone?
Portability of Prediction Across Ancestries
34
Martin et al 2019 NG https://www.nature.com/articles/s41588-019-0379-x
Ancestry Composition of Current GWAS
35
Martin et al 2019 NG https://www.nature.com/articles/s41588-019-0379-x
Allele Frequency and LD Differ Across Ancestries
36
Martin et al 2019 NG https://www.nature.com/articles/s41588-019-0379-x
PRS Does Not Transfer Well Across Populations
37
Martin et al 2019 NG https://www.nature.com/articles/s41588-019-0379-x
Investments in more diverse samples are being made
Bioethical Issues with Embryo Screening
Embryo Selection Active Area of Research
41
Embryo Selection Active Area of Research
42
Orchid Offers Preconception Testing
43
https://twitter.com/alexia/status/1380236427485667335
Are they helping eliminate disease or implementing eugenics?
You can check out Lior Pachter's opinionated take on this subject
https://liorpachter.wordpress.com/2021/04/12/the-amoral-nonsense-of-orchids-embryo-selection/
Effect on Schizophrenia Risk Selection
44
Title Text
45
Fig. 1: Hazard ratios (HR) per one standard deviation (SD) increase in the standardized polygenic risk score (PRS).