Hae Kyung Im, PhD
Principal Component Analysis and Population Structure
April 11, 2022
Genotype Matrix, a Treasure Trove
2
Principal Components Reveals Demographic History
3
J. Novembre, et al “Genes mirror geography within Europe,” Nature, vol. 456, no. 7218, pp. 98–101, Aug. 2008.
Could Population Structure Bias GWAS Results?
Spurious Association Due to Population Structure
5
Case
Control
Spurious Association Due to Population Structure
6
Control
maf=50%
maf=25%
maf=50%
maf=25%
maf=40%
maf=35%
Case
How to Correct for Population Structure?
1. Correcting with genomic control (Devlin and Roeder 1999)
2. Inferring the latent sub-populations (Pritchard et al 2000)
Fit association in each population separately and combine
3. Adjusting for principal components
(Patterson 2006, Novembre 2008, Price et al 2010)
4. Mixed effects modeling (EMMAX, Kang et al 2010)
7
Principal Component Analysis
Principal Component Analysis (SVD)
9
DATA
n x M
=
x
x
x
x
+
+ ...
:
:
u1
u2
d1
d2
v'1
v'2
Geometric Interpretation of Singular Value Decomposition
10
X
D
X
D
X
D
Example Population Structure
HapMap Project
12
An international project to create a haplotype map of the human genome
1000 Genomes Project
13
Auton, A., Altshuler, D. M., Durbin, R. M., Chakravarti, A., Clark, A. G., Donnelly, P., et al. (2015). A global reference for human genetic variation. Nature, 526(7571), 68–74. http://doi.org/10.1038/nature15393
HapMap Phase 3 Populations
14
HapMap Phase 3 Populations
15
Population Structure in HapMap
16
https://hakyimlab.github.io/hgen471/L6-population-structure.html
Population Structure in HapMap
17
Population Structure in HapMap
18
Population Structure in HapMap
19
PCA in UK Biobank
20
GWAS in Multi-ancestry Samples
Example: Growth Phenotype by Population
22
H. K. Im et al, “Mixed effects modeling of proliferation rates in cell-based models: consequence for pharmacogenomics and cancer.,” PLoS Genetics, 2012.
Example: Growth Phenotype by Population
23
https://hakyimlab.github.io/hgen471/L6-population-structure.html
Populations Differences Lead to Inflation of Small P-values
24
https://hakyimlab.github.io/hgen471/L6-population-structure.html
Populations Differences Lead to Inflation of Small P-values
25
https://hakyimlab.github.io/hgen471/L6-population-structure.html
Populations Differences Lead to Inflation of Small P-values
26
https://hakyimlab.github.io/hgen471/L6-population-structure.html
what happens if we add principal components as covariates in the regression?
Growth GWAS Adjusted with PCs
28
Heritability
Types of Heritability
30
Review
Matrix Algebra
Matrix Algebra
32
Addition
Scalar
Multiplication
Transposition
Matrix Multiplication
33
By File:Matrix multiplication diagram.svg:User:BilouSee below. - This file was derived from: Matrix multiplication diagram.svg,
CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15175268
Matrix Multiplication
34
Matrix Form of System of Linear Equations
35
https://en.wikipedia.org/wiki/Matrix_(mathematics)
Derive Linear Regression Solution with Matrix Notation
36
Hardy Weinberg Equilibrium
Hardy Weinberg Equilibrium
38
Title Text
39
https://www.nature.com/scitable/definition/hardy-weinberg-equilibrium-122/