1 of 7

UShER

2 of 7

Background

  • ~7M SARS-CoV-2 genomes sequenced since 2020
  • Storing the genomic data of these sequences is a considerable challenge
    • Real time tracking of viral introductions and lineages is essential for public health
  • The Corbett-Detig Lab and it’s collaborators developed UShER to address this computational dilemma

3 of 7

Concept Review

Multiple Sequence Alignment

Phylogeny

ACGTACGT_ _ _ ACGT

ACGTACGTGGGACGT

Indel

ACGTACGTACGT

ACGTACGTCCGT

Single Nucleotide Variant

4 of 7

UShER at a Glance

  • Adds new SARS-CoV-2 genomes to a global phylogeny
  • Improves runtime by adding new samples to an existing tree
  • Improves storage by storing the updated tree using using evolutionary compression
    • All descendents of a mutated sample are inferred to carry the same mutation

E

5 of 7

Evolutionary Compression

    • UShER data structure (Mutation Annotated Tree) uses evolutionary compression to store mutations
    • MAT is encoded into an extremely efficient binary file

6 of 7

Our project:

  • Analyzing strain by nearest neighbors
  • UShER in UCSC Genome Browser
    • https://genome.ucsc.edu/cgi-bin/hgPhyloPlacehttps://genome.ucsc.edu/cgi-bin/hgPhyloPlace
  • UShER in command line

7 of 7