pangolin --usher
Angie Hinrichs
StaPH-B - Nov. 19, 2021
Outline
Ultrafast Sample placement on Existing tRees (UShER)
https://github.com/yatisht/usher/ Yatish Turakhia, UCSD
UCSC’s Big Trees
UCSC’s Big Trees
>5M: GISAID, GenBank, COG-UK
Not publicly shareable 😒
>2.5M: GenBank, COG-UK
(colored by Pango lineage)
UCSC’s Big Trees
Browse the public tree with Taxonium
Theo Sanderson
Francis Crick Institute /
Wellcome Sanger Institute
Pango lineages
What defines a Pango lineage?
...
India/GJ-ICMR-NIV-INSACOG-GSEQ-3045/2021,B.1.617.2
India/PY-SEQ_294_S22_R1_001/2021,B.1.617.2
Malaysia/IMR_682164/2021,B.1.617.2
Japan/IC-1175/2021,B.1.617.2
USA/TX-CDC-ASC210037740/2021,B.1.617.2
England/WSFT-25C6539/2021,B.1.1.7
USA/MI-UM-10039543606/2021,AY.3
USA/KS-KHEL-1922/2021,AY.3
USA/KS-KHEL-1923/2021,AY.3
USA/MO-MSPHL-002099/2021,AY.3
USA/MO-MSPHL-002132/2021,AY.3
...
A Brief History of Pangolin
How does pangoLEARN work?
Figure 2, Áine O’Toole, Emily Scher, et al., Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool, Virus Evolution, Volume 7, Issue 2, November 2021, veab064, https://doi.org/10.1093/ve/veab064
How does pangoLEARN work?
SARS-CoV-2 genome sequences
(aligned to genome, masked)
Binary vectors
(0 = ref or N, 1 = alt)
pangoLEARN training
pango-designation/�lineages.csv
Decision tree model
1
1
1
0
0
0
Training:
How does pangoLEARN work?
Decision tree model
1
1
1
0
0
0
User SARS-CoV-2 sequences
Align to reference, mask
Binary vectors
Assigned lineage
Running pangolin:
How does pangolin --usher work?
Making lineage-annotated tree:
pango-designation/�lineages.csv
matUtils annotate
UCSC big tree
matUtils reroot,
matUtils mask -m…,
matUtils extract -r...,
matUtils mask -s...
A
B
B.1.1.7
B.1.617.2
AY.4
How does pangolin --usher work?
Running pangolin:
User SARS-CoV-2 sequences
What’s the difference?
Not all assignments come from pangoLEARN/UShER
1002005561,AY.44,,,,,,PANGO-v1.2.93,3.1.16,2021-11-09,v1.2.93,passed_qc,Assigned from designation hash.
2000051407,B.1.617.2,0.0,0.9288622754491018,Delta (B.1.617.2-like),0.384600,0.076900,PLEARN-v1.2.93,3.1.16,2021-11-09,v1.2.93,passed_qc,scorpio call: Alt alleles 5; Ref alleles 1; Amb alleles 6; Oth alleles 1; scorpio replaced lineage assignment B.1.1.7
3000136426,None,,,,,,PLEARN-v1.2.93,3.1.16,2021-11-09,v1.2.93,passed_qc,pangoLEARN lineage assignment AY.4.5 was not supported by scorpio
3000137678,B.1.617.2,0.5,,Delta(B.1.617.2-like),1.000000,0.000000,PUSHER-v1.2.93,3.1.16,,v1.2.93,passed_qc,scorpio call: Alt alleles 13; Ref alleles 0; Amb alleles 0; scorpio replaced lineage assignment AY.4; Usher placements: AY.4(1/2) B.1.617.2(1/2)
7000000606,None,,,,,,PUSHER-v1.2.93,3.1.16,,v1.2.93,passed_qc,usher lineage assignment AY.13 was not supported by scorpio; Usher placements: AY.13(5/6) B.1.617.2(1/6)
Looking forward...
Acknowledgements
UCSC’s Big Trees
>5M: GISAID, GenBank, COG-UK
Not publicly shareable
>2.5M: GenBank, COG-UK
(colored by country)