Workflows, Planemo, BioBlend and tags to automate SARS-CoV-2 genome surveillance
Wolfgang Maier
Galaxy Europe Team
University of Freiburg, Germany
2022-09-29
The Problem
tiled amplicons
amplified viral cDNA
sequencing reads of amplicons
SARS-CoV-2 lineage stats
data processing steps
tiled amplicons
amplified viral cDNA
sequencing reads of amplicons
SARS-CoV-2 lineage stats
data processing steps
data processing steps
mapping
variant calling (mutations)
primer trimming
consensus building
lineage assignment (with pangolin or nextclade)
tiled amplicons
amplified viral cDNA
sequencing reads of amplicons
SARS-CoV-2 lineage stats
information content
data processing steps
!
risk of fragmentation /
poor comparability
The Solution
Illumina WGS
Illumina ARTIC
ONT ARTIC
mutations in standard VCF format with rich call statistics and annotations
Reporting
Consensus building
bwa-mem
lofreq
snpEff (covid-19 release)
bwa-mem
lofreq
snpEff (covid-19 release)
ivar
mapping
variant calling
variant annotation
primer trimming
minimap2
medaka
snpEff (covid-19 release)
covid19.galaxyproject.org variation analysis workflows
Illumina WGS
Illumina ARTIC
ONT ARTIC
mutations in standard VCF format with rich call statistics and annotations
Reporting
Consensus building
version-controlled workflows with defined releases
version-controlled workflows with defined releases
version-controlled workflows with defined releases
version-controlled workflows with defined releases
version-controlled workflows with defined releases
A new problem: lots of WFs to run
Solution:
Automation through the API
https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/workflow-automation/tutorial.html
https://training.galaxyproject.org/training-material/topics/galaxy-interface/tutorials/workflow-automation/tutorial.html
use bioblend to fill the template and create the final job.yml file for planemo run
record states,
identify histories
Workflow run automation via the Galaxy API
VCF
Reports
Consensus FASTA
Downstream variant analysis/providers
Direct data exploration through tabular datasets and plots
nextstrain, pangolin�GISAID�Genome surveillance initiatives
A public archive of
usegalaxy.*
Covid-19 analysis efforts
ftp://xfer13.crg.eu/
full access to ~2,000 analysis batches on ~ 400,000 SARS-CoV-2 samples
What’s next?
SARS-CoV-X
Influenza
West Nile
MERS
Nipah
Monkeypox
Ebola
Lassa
?
(invest in these things if you don’t have them yet)
Anton Nekrutenko
Björn Grüning,
Simon Bray,
Nathan Roach,
Marius van den Beek,
Dannon Baker
Sergei Pond,
Ulvi Talas, Peter van Heusden,
Babita Singh, Mauricio Moldes
Project overview:
https://galaxyproject.org/projects/covid19
Dashboard:
https://covid19.galaxyproject.org/dashboard
UCSC genome browser track:
https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=wuhCor1&c=NC_045512v2&g=galaxyEna
Viral Beacon:
Workflows:
https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling�
https://dockstore.org/organizations/iwc/collections/Covid�
https://workflowhub.eu/workflows?filter%5Btag%5D=covid19.galaxyproject.org
Automation Scripts:
https://github.com/usegalaxy-eu/ena-cog-uk-wfs
https://github.com/usegalaxy-eu/sars-cov-2-processing-requests
Data:
ftp://xfer13.crg.eu