Mini-Symposium: Reproducible Analytics & Infrastructure
Chair: Jonathan Tedds (Hub)
Speakers: Carole Goble (UK), Bjoern Groening (DE),
Eva Alloza (ES), Alex Kanitz (CH), Juha Törnroos (FI)
Wednesday, June 6th, 2023; 16:00-17:30 IST (local time) // 17:00 - 18:30 CEST
16:00 – 17:30 MS6: Reproducible Analytics & Infrastructure
Chair: Jonathan Tedds
Speakers: Carole Goble, Bjoern Groening, Eva Alloza, Alex Kanitz, Juha Törnroos
19:00 - 23:00 Social event and dinner in EPIC museum
Social event at the EPIC museum
Tonight at 19:00 - 23:00, please arrive promptly
Please bring your badge
The chq Building,
Custom House Quay,
Dublin 1
Agenda
Time (IST) | Subject | ||||||
16.00-16.15 | Title: Welcome and Introduction ELIXIR Tools and Compute Platform Plans for Enabling Reproducible Analytics in 2024-28 Speaker: Jonathan Tedds (Hub) | ||||||
16.15-16.30 | Title: EOSC Life WorkflowHub Speaker: Carole Goble (UK) | ||||||
16.30-16.45 | Title: Galaxy Analytics Ecosystem Speaker: Bjoern Groening (DE) | ||||||
16.45-17.00 | Title: ELIXIR Strategic Implementation Study Use Cases on Container Deployment Speaker: Eva Alloza (ES) | ||||||
17.00-17.15 | Title: ELIXIR::GA4GH Cloud and AAI Driver Project Speaker: Alex Kanitz (CH) | ||||||
17.15-17.30 | Title: ELIXIR Compute Applications in Human Data Projects including GDI Speaker: Juha Tohnroos (FI) | ||||||
Close | |
#ELIXIR23
ELIXIR Key Platform Plans for Enabling Reproducible Analytics in 2024-28
Jonathan Tedds (Hub)
Tools Platform Vision for 2024-2028
The ELIXIR Tools Platform aims to …
The ELIXIR Tools Platform: In practical terms
The platform
- Software Development
- Software Management plan
Actionable Best practices
- bio.tools (+ EDAM)
- BioConda/BioContainers
- OpenEBench
- WorkflowHub
- Galaxy
In-production Services
- Tools Contributors
- Communities support & outreach
Community-driven
WP1: Consolidating a comprehensive Tools Ecosystem
Individual tool repositories
Tools Ecosystem repository
Correction/modification PR
Validation
biotoolsSchema
[WP3]
[WP4]
[WP2]
WP2: Software management, stewardship, and standards
Link/collaborate with other WPs, platforms, communities, and global initiatives: EOSC, RDA, NIH, Australian BioCommons, GA4GH.
WP3: Software for reproducible and distributed analysis
- Quality software metrics
- FAIR for Research Software
- General Trends for SW development
High-quality software
- Common challenges
- Agreed scientific evaluation metrics
Community-driven
- Best practices for SW containerization.
- Platform/s to facilitate SW containers generation & distribution.
Deployable everywhere
WP4: Democratised access to reusable complex workflows
Compute Platform 2024+ Goal and structure
To contribute to/build an ecosystem that supports reliable processing/analysis of sensitive data
○ Also covering environments for non-sensitive data with lower security/trust requirements
=> Deliver services to support federated data management and analytics in life science
through 4 complementary WPs:
WP1 – Advanced Service Access Control
WP1 – Advanced Service Access Control
WP2 - Protected access to sensitive data for analysis
European regulation on sensitive human data management is tightening. We adapt to this by
=> From secure access through secure processing/analysis to publication
=> Evolution of tools and their secure interaction in a federated distributed world
WP3 - Multi Cloud Infrastructure Deployment
techniques and developing services based on widely
adopted community standards (GA4GH Cloud APIs)
Image by vectorjuice on Freepik
WP3 - Multi Cloud Infrastructure Deployment
workflow types (e.g., CWL, Nextflow, Galaxy) on HPC
clusters, native/on-premises cloud clusters and commercial
clouds, accessing data on commonly used storage solutions
(e.g., s3 compatible object storage)
WP4 - Sustainability, Accounting and Provenance for Federated Analytics
=> Service requires accounting and governance
EOSC Life WorkflowHub
Frederik Coppens (BE)
Carole Goble (UK)
EOSC-Life WorkflowHub
Frederik Coppens (BE)
Carole Goble (UK)
With thanks to Johan Gustafsson (Australian BioCommons), Beatriz Serrano-Solano (EuroBioimaging), Finn Bacall (ELIXIR-UK)
Mini-Symposium: Reproducible Analytics & Infrastructure
Where can I find these reproducible workflows?
Platforms / community repositories
Git Repos
Publications
Murigneux, V., Roberts, L.W., Forde, B.M. et al. MicroPIPE: validating an end-to-end workflow for high-quality complete bacterial genome construction. BMC Genomics 22, 474 (2021). https://doi.org/10.1186/s12864-021-07767-z
Lott, M. J., Wright, B. R., Neaves, L. E., Frankham, G. J., Dennison, S., Eldridge, M. D. B., Potter, S., Alquezar-Planas, D. E., Hogg, C. J., Belov, K., & Johnson, R. N. (2022). Future-proofing the koala: Synergising genomic and environmental data for effective species management. Molecular Ecology, 31, 3035– 3055. https://doi.org/10.1111/mec.16446
Data Repositories
IWC - Intergalactic Workflow Commission
Overcome a distributed, fragmented and variable world…
Use a registry.
integrated
central
searchable
standardised
citable
interoperable
FAIR workflow services
The WorkflowHub
354 workflows
16 system types
481 contributors
165 teams
9 collections
426K+ accesses
5.8K+ downloads
Started April 2020, Beta Sept 2020, Launch Sept 2022
Popular platforms have dedicated support on both sides.
Galaxy: RO-Crate support, metadata extraction, search, execution, monitoring IWC.
Diverse platforms, spectrum of support.
Work with anything
WorkflowHub – Features�Doesn’t replace community and dev repositories, works with them
System agnostic
Native repository support
Multiple artefacts types
Git integration
Links to docs, test data, reference data
Versions, any stage of development
Metadata extraction & standards
Author credit, citation and workflow attribution
Bio.tools integration
DOIs & ORCID, integration with DataCite
Standardised Machine Processable Metadata�for reproducibility, metadata matters!
Common workflow description independent of platform
Abstract CWL
Common metadata about the workflow, tools & parameters
Digital Object format packaging workflows, metadata, companion data objects, logs
https://www.commonwl.org
https://bioschemas.org/
https://edamontology.org/
https://www.researchobject.org/ro-crate/
https://citation-file-format.github.io/
Run
Integration with
service ecosystem
APIs
Digital Objects
Search and execution by GA4GH TRS compliant platforms
LifeMonitor monitoring automated workflow tests and best practice adherence in Git repo using existing frameworks
Not all workflows are portable – maybe facility-bound
Integration with
scholarly ecosystem
APIs
Digital Objects
DOIs for workflows and other content. Integration with DataCite, ORCID
RO-Crate deposition in Zenodo
Contribution to PIDGraph, and hence OpenAIRE Research Graph
Early FAIR Signposting adopter
FAIR-IMPACT software metadata recommendations.
Work with anyone
`
WorkflowHub Club
As a registry and for best practices and guidelines for FAIR workflows.
Registering a workflow goes a long way to being FAIR.
Steps towards collaborating with Publishers.
Spaces & Teams: Communities of Practice
Supporting Reproducibility in the Future
Community
Knowledge
Workflow publishing – policy, recommend, support by selected publishers
https://www.eventbrite.co.uk/e/workflowhub-ask-me-anything-session-tickets-629927479047
https://workflowhub.eu/
WorkflowHub Club
WorkflowHub
RO-Crate
https://www.researchobject.org/ro-crate/
Galaxy Europe
Bioschemas
CWL
LifeMonitor
Workflow Community Initiative
https://workflows.community/about
Goble, Carole, Soiland-Reyes, Stian, Bacall, Finn, Owen, Stuart, Pireddu, Luca, & Leo, Simone. (2023). EOSC-Life Implementation of a mechanism for publishing and sharing workflows across instances of the environment. Zenodo. https://doi.org/10.5281/zenodo.7886545
Galaxy Analytics Ecosystem
Bjoern Gruening (DE)
Galaxy as large scale compute infrastructure
Open Infrastructure
#ELIXIR23
On-boarding: APIs, standards, logins
#ELIXIR23
Mirco-services to handle requests
web
jobs
workflows
#ELIXIR23
Micro-services to handle requests
web
jobs
web
jobs
workflows
web
jobs
workflows
workflows
#ELIXIR23
Common theme: It’s actually a bit more complicated ;)
jobs
workflows
web
web
web
#ELIXIR23
Galaxy as large scale compute infrastructure
workflows
web
jobs
jobs
#ELIXIR23
Galaxy as large scale compute infrastructure
workflows
web
jobs
#ELIXIR23
Galaxy as large scale compute infrastructure
workflows
web
jobs
#ELIXIR23
Things that I left out … time and so …
#ELIXIR23
Things that I left out … time and so …
#ELIXIR23
Open Infrastructure
https://github.com/usegalaxy-eu
https://eurosciencegateway.eu
workflows
web
jobs
#ELIXIR23
Open Infrastructure
https://github.com/usegalaxy-eu
https://eurosciencegateway.eu
workflows
web
jobs
#ELIXIR23
ELIXIR Strategic Implementation Study Use Cases on Container Deployment
Eva Alloza (ES)
Compute Platform
Tools
Platform
SIS Containers 2021-23 - Overview
Strategic Implementation Study (SIS)
Areas of Focus & Structure
Compute Platform
Tools
Platform
#ELIXIR23
SIS Containers 2021-23 - Overview
Areas of Focus & Structure
Compute Platform
Tools
Platform
#ELIXIR23
SIS Containers 2021-23 - Bioinformatics Container Usage
Usage
→ these workflows are more important than ever
→ consistent and stable environments are required
User Community Engagement and Adoption
#ELIXIR23
Use-cases entry point
#ELIXIR23
Use-cases entry point
#ELIXIR23
Use-cases identified
#ELIXIR23
Use-cases engagement
#ELIXIR23
Use-cases at Q&A workshop
#ELIXIR23
Some questions from use-cases…
Licensing files | Encourage user/software providers to use containers | Container access to internet | Not fully automated workflows | Increase reproducibility of validation steps of genome scale metabolic models as github actions |
Dependencies management | Make tools available for users not familiar with coding | Deployment of notebooks in the cloud | Security aspects | User-friendly environment to run clustering processing of medical data |
Development and deployment of nextflow pipelines | Optimisation of container's size | Usage of labels in dockers | Training | Transfering labels from Singularity to Docker |
Standardise inputs and outputs of analysis using Nextflow and Docker | Community efforts to create packages, shortage of staff with skills | Orchestration problems of singularity containers | Requirements of secure docker containers | Make more efficient the computing steps of the container construction |
#ELIXIR23
WP2 - Technical Support for Users
Overview of services
WP3 - Technical Support for Providers
Documentation and materials
Publication
User Community Engagement and Adoption
containers and workflows, providing a wide range of topics
from design to implementation and deployment.
#ELIXIR23
Special acknowledgements
Alvaro Gonzalez (FI)
Fotis Psomopoulos (GR), Beatriz Serrano-Solano (DE)
WP2 - users’ tech support: Frederik Coppens (BE), Sven Twardziok (DE)
WP3 - providers’ tech support: Alex Kanitz (CH), Björn Grüning (DE)
WP4 - sustainability: Justin Clark-Casey (EMBL-EBI), Martin Čech (CZ)
Hub support - Jonathan Tedds, Gavin Farrell
Compute Platform
Tools
Platform
#ELIXIR23
ELIXIR::GA4GH Cloud and AAI Driver Project
Alex Kanitz (CH)
in a
“Collaborate on standards, compete on implementations!”
#ELIXIR23
The relevant GA4GH standards
Passport | Authenticate and authorize users |
Service Registry API | Discover and access service instances |
Tool Registry Service API | Discover and access workflows and container images |
Workflow Execution Service API | Interpret workflows and schedule task execution |
Task Execution Service API | Execute containerized tasks |
Data Repository Service API | Access data sets |
The ELIXIR::GA4GH Cloud
Home organization
Here be dragons!
Compute federation
#ELIXIR23
Compute federation: Gateway
#ELIXIR23
Compute federation: CWL workflow
#ELIXIR23
Compute federation: Demo
A recording of a live demo can be found here
The demo itself can be found here
To run the demo yourself, you need to have:
Alternatively, reach out to us privately to get the info/secrets to run the demo on the ELIXIR Cloud infrastructure
#ELIXIR23
Compute federation: Future directions
#ELIXIR23
Compute federation:�2023 integration goals
#ELIXIR23
Other activities
#ELIXIR23
Acknowledgments
Jonathan�Tedds�ELIXIR Hub
Álvaro�González�ELIXIR-FI
Michael�Crusoe�ELIXIR-NL
Justin�Clark-Casey�EMBL/EBI
ELIXIR-GR
ELIXIR Community
Collaborators
ELIXIR Cloud & AAI co-leads
Thank you!
Affiliations
ELIXIR Compute Applications in Human Data Projects including GDI
Juha Törnroos (FI)
Federated data and
federated analysis
Why?
ELIXIR Europe
Federated data
Why?
Federated data, some initiatives
Federated analysis
Why?
Federated analysis, some initiatives
How about the future?
Social event at the EPIC museum
Tonight at 19:00 - 23:00, please arrive promptly
Please bring your badge
The chq Building,
Custom House Quay,
Dublin 1
Thank you for joining
Please consider contributing to the development of the new Programme in Platforms and future cross Platform and Community based activities