Upfront Imputation Improves Read Alignment
Taher Mun
Joint Lab Meeting 11/22/2019
Reference bias
1
1
2
3
2
3
Reference bias affects mapping accuracy
30x 100bp reads simulated from chr21 of 5 1KG samples
Ref bias = total # reads w REF allele / total # reads w ALT allele
Reference bias affects alignments over het sites
50x paired-end reads from NA12878 provided by GIAB (Zook et al, 2016, doi: 10.1038/sdata.2016.25)
(chr21)
Reference bias affects low-coverage variant calling
Bobo et. al. 2018 https://doi.org/10.1101/066043
How can we improve alignment in the face of reference bias?
Solution: change the reference
�
Personal is “best case scenario”
30x 100bp reads simulated from chr21 of 5 1KG samples
Ref bias = total # reads w REF allele / total # reads w ALT allele
Solution 2: build a personalized reference
Proposed Solution: Create a personalized reference with sparse data
Goals:
Li Stephens model helps impute unknown genotypes
Transition prob = recombination rate
= error rate
Li and Stephens, 2003
1000 Genomes Project
Imputation is dominant step
Future work
Acknowledgements
Langmead Lab
Dr. Ben Langmead
Dr. Brad Solomon
Christopher Wilks
Daniel Baker
Charlotte Darby
Nae-Chyun Chen
Rone Charles