COVID-19 analysis in Galaxy:
Importance of (open) infrastructures in responding to a pandemic
27 January 2021
17.00 CET
Nadim Rahman
Guy Cochrane
EMBL-EBI
Andrew Lonie
Björn Grüning
Frederik Coppens
usegalaxy.*
Housekeeping
This session will be recorded
Please remain muted unless you’re invited to speak by the Chair.
This meeting will be run in line with the ELIXIR Code of Conduct. If you have any concerns please refer to the Code of Conduct, found on the ELIXIR website
Please use “Q&A” to raise questions during the presentation.
Please use the “hand-raising function” to indicate you would like to contribute directly
Please use “Chat” for further comments or discussions.
Running sheet
INTRO: David Lloyd
CONTEXT [1 min]: Frederik (slide 5)
DATA [15 mins]: Guy / Nadim
ANALYSIS (Tools and Infrastructure) [15-17 mins]:
INTEGRATED ECOSYSTEM [10 mins]: Frederik
COVID-19 analysis in Galaxy:
Importance of (open) infrastructures in responding to a pandemic
Andrew Lonie, Nadim Rahman, Guy Cochrane
Björn Grüning, Frederik Coppens
@galaxyproject
Tools Ecosystem
Nadim Rahman, Guy Cochrane
The European COVID-19 Data Platform
European COVID-19 Data Platform
EMBL-EBI
European Research Infrastructures
International initiatives
National Infrastructures
COVID-19 Research
COVID-19 Data Portal
COVID-19 Data Portal
European Nucleotide Archive
Archive
Platform
3
ENA data reach
Drysdale et al. (2020) The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences. Bioinformatics, 2020, 1–7; http://doi.org/10.1093/bioinformatics/btz959
Cook et al. (2020) The European Bioinformatics Institute in 2020: building a global infrastructure of interconnected data resources for the life sciences. Nucleic Acids Research 48:D17-D23; http://doi.org/10.1093/nar/gkz1033
Further reach
�
Rohden F, Huang S, Dröge G, Hartman Scholz A, and contributing authors (2019). Combined study in DSI in public and private databases and DSI traceability. https://www.cbd.int/abs/DSI-peer/Study-Traceability-databases.pdf
SARS-CoV-2 Data Hubs
COVID-19 Data Flow: Data Mobilisation
COVID-19 Data Flow: Data Mobilisation
COVID-19 Data Flow: Data Mobilisation
COVID-19 Data Flow: Tools
COVID-19 Data Flow: Tools
COVID-19 Data Flow: Data Discovery and Access
European COVID-19 Data Platform: example of use
European COVID-19 Data Platform
data mobilisation
SARS-CoV-2 Data Hubs
Galaxy Project
CRG COVID Viral Beacon
COVID-19 Data Portal
analytical workflows
visualization & navigation
data access
data
users
COVID-19 Analysis in Galaxy
Björn Grüning
Tools Ecosystem
COVID-19 analysis on usegalaxy.★
https://covid19.galaxyproject.org
Proteomics
New tools and workflows
New/updated workflows: ARTIC/ONT
New updated/workflows: Consensus construction
~ 100.000 samples
WGS, Amplicon, DRS
Mirrored data for easy access
Analysing → Monitoring
Galaxy Australia: an exemplar of research infrastructure cooperation
Andrew Lonie
usegalaxy.org.au
Galaxy Australia is a hosted web-based platform that lets anyone conduct accessible, reproducible, and transparent computational life sciences research. It is part of the global usegalaxy.★ collaboration between large public Galaxy servers
Early Pandemic - the race to publication
How can we make it easier/more reproducible?
For everyone!
Workflows: Efficiency through Galaxy controlled scheduling
Pre-processing
Assembly
MRCA timing
Variation analysis
S- analysis
Evolutionary analysis
Genomics
Now in PLoS Pathogens: https://doi.org/10.1371/journal.ppat.1008643
Pulsar
To create this network of shared computational resources, we leverage Pulsar, a Task Execution Service for Galaxy. Pulsar allows a Galaxy server to automatically interact with those remote systems, ensuring job and provenance information are correctly exchanged.
https://github.com/galaxyproject/pulsar
usegalaxy.org.au
Galaxy Australia
Brisbane
Main Slurm Queue, Main storage
Pulsar
Pulsar
Pulsar
Getting resources to help - quickly
COVID merit allocation at Pawsey
A Pulsar Cluster in the Cloud
Setup in an afternoon
Perth Pulsar-paw
(COVID-19 Jobs)
Pulsar Server
Pulsar/Slurm/NFS
Worker Node 1
Worker Node 2
Worker Node 3
Worker Node 4
Worker Node 5
Volume
Connection with Galaxy Australia
Galaxy Australia was able to send COVID related jobs to Pawsey that day!
Some COVID analysis stats
An integrated ecosystem
Frederik Coppens
Virtual environment
Seamless integration of services
Based on standardisation
Across scientific disciplines and borders
AnVIL: Inverting the model of genomic data sharing
Traditional: Bring data to the researcher
Goal: Bring researcher to the data
Virtual environment
Seamless integration of services
Based on standardisation
Across scientific disciplines and borders
OECD recommendation on Access to Research Data
On 20 January 2021, the OECD Council adopted a revised Council Recommendation on Access to Research Data from Public Funding.
... expands the scope to cover not only research data, but also related metadata, as well as bespoke algorithms, workflows, models, and software (including code), which are essential for their interpretation.
RECOGNISING that re-use and value of data can depend on the availability of relevant metadata, algorithms, code, and software, from public funding together with information on workflows and the computational environment used to generate published findings, and that providing access to these other research-relevant digital objects from public funding along with the data itself can be essential;
Tools collaboratory
Bio.Tools
BioContainers
Workflows
Tools
c
Registries
Packaging
Testing
Powered By
EDAM ontology
ELIXIR Tools Ecosystem
BioContainers
160943 containers
bio.tools
17007 tools
24713 tools
OpenEBench
7923 tools in Galaxy toolshed
Galaxy
Beta release 2020
72 workflows
WorkflowHub.eu
WorkflowHub.eu : workflow registry
51
Leading work on metadata standards
Workflows...
Contributing to WP6 metadata
standards and repositories
Integration
An EOSC-Life product
Virtual environment
Seamless integration of services
Based on standardisation
Across scientific disciplines and borders
Across disciplines : covid19.galaxyproject.org
Webinar February 24
Webinar February 10
Training.galaxyproject.org
Across Galaxy instances
Global collaboration of managed public Galaxy instances
On demand Galaxy instances (ELIXIR Italy)
Deploy your own container
https://github.com/ELIXIR-Belgium/covid-19-galaxy-container
Open Source code: build your own
usegalaxy.* community expanding
usegalaxy.org
usegalaxy.org.au
usegalaxy.eu
usegalaxy.fr
usegalaxy.be
usegalaxy.ee
usegalaxy.es
Pulsar-Network
The most innovative computing centers across
Europe are currently interested to share their
remote computation power to support the
UseGalaxy.eu load:
https://pulsar-network.readthedocs.io/en/latest/project/partners.html
Virtual environment
Seamless integration of services
Based on standardisation
Across scientific disciplines and borders
Integration with CRG COVID Viral Beacon
European COVID-19 Data Platform
data mobilisation
SARS-CoV-2 Data Hubs
Galaxy Project
CRG COVID Viral Beacon
COVID-19 Data Portal
analytical workflows
visualization & navigation
data access
data
users
Webinar February 17
Data retrieval & submission
Alignment of queries
to (re)analyse data
Submission of
(cleaned) viral data
Webinar February 3
Exemplar implementation of
In a global context
Building on existing, open infrastructure
Tools Ecosystem
Acknowledgments
usegalaxy.org efforts are funded by NIH Grants U41 HG006620 and NSF ABI Grant 1661497. usegalaxy.eu is supported by the German Federal Ministry of Education and Research grants 031L0101C and de.NBI-epi. Galaxy and HyPhy integration is supported by NIH grant R01 AI134384. usegalaxy.org.au is supported by Bioplatforms Australia and the Australian Research Data Commons through funding from the Australian Government National Collaborative Research Infrastructure Strategy. Hyphy.org development team is supported by NIH grant R01GM093939. usegalaxy.be is supported by the Research Foundation-Flanders (FWO) grant I002919N and the Flemish Supercomputer Center (VSC). EOSC-Life has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 824087
To get involved: https://galaxyproject.org/community
Training materials: https://training.galaxyproject.org
Further information: https://usegalaxy.eu