1 of 2

Enabling an operational, open and FAIR EOSC ecosystem

Sebastian Schaaf1,2, Anika Erxleben-Eggenhofer1, Björn Grüning1, and The EuroScienceGateway Team

1Bioinformatics Group, Department of Computer Science, Albert Ludwigs University Freiburg, Freiburg, Germany

2ELIXIR DE Officers Team, Institute of Bio- and Geosciences (IBG-5), Forschungszentrum Jülich, Jülich, Germany

Project Structure

PULSAR Network: distributed computing across Europe

Expected Results

EuroScienceGateway (ESG)1 will deliver a robust, scalable, seamlessly integrated open infrastructure for data-driven research. In ESG, the European Open Science Cloud (EOSC)2 initiative and its components meet with Galaxy as a user-oriented scientific workflow and data integration platform. Galaxy serves a broad range of scientific domains, linking computing and data resources, providing easy access for non-IT users, reproducibility features as well as various sharing options as a matter of principles. Thus, ESG fosters federated computational infrastructures and the principal demo- cratization of research data analysis.

How does this relate to ELIXIR?

In ELIXIR, Galaxy is an integral component and has matured to a European community3, also providing centralized access to national nodes’ resources and Galaxy instances. Several ELIXIR Implementation Studies (IS) have been carried out or are ongoing for and on Galaxy4. In fact, ESG’s main project parts have been majorly supported by and grown in ELIXIR:

European Galaxy Server: Hosting (sub-)communities

The heart and soul of Galaxy are users and contributors, largely self-organized in interest groups dedicated to topics like collected below. Active onboarding of given scientific communities into the Galaxy space is a central task in ESG (WP5).

Communities

Science Gateway with a custom set of tools and workflows for the Biodiversity/Climate, Materials Science and Astrophysics community.

FAIR data

EOSC-catalogued FAIR data and workflows that can be found, consumed, created and published by users of the ESG, demonstrated for each of the 3 domain-specific use-cases.

Pulsar Network

A mature and tested (TRL-9) distributed compute network with demonstrated usage across at least 12 European partners.

ScienceGateways

6 national Galaxy instances operational and proven by the scientific community with more than 100,000 users.

ESG Partners

The European Galaxy server useGalaxy.eu has 70.800 registered users, who run 2 million jobs a month. On the server 331.000 workflows have been executed, resulting in 60+M jobs5. Support for EDAM, bio.tools, OpenEBench, WorkflowHub, RO-Crate, AAI, GA4GH standards is available, ensuring a sustainable service.

The Pulsar Network is a distributed job execution system linking several European data centers, scaling Galaxy across heterogeneous resources.

The Galaxy Training Network (GTN) is offering learning paths, TeSS integration, rich metadata annotations and training for developers, admins and scientists. 300+ contributors created >400 tutorials for 37 topics, and close to 8 years of video recordings6.

WorkflowHub is an EOSC-wide registry for sharing and publishing of computational workflows, promo- ting workflows as FAIR scholarly objects by using open standards (CWL, RO-Crate, Bioschemas, GA4GH TRS)

Fig. 1 - Five work packages take care for tasks in infrastructure (Galaxy & PULSAR), FAIRification (WorkflowHub, RO-Crate) and use case-driven onboarding of new communities (biodiversity/climate, materials, astrophysics). Importantly, ESG builds on given infrastructures, focussing on interoperability.

Fig. 2 - An overall number of 18 partners from 13 countries co-operate in ESG to enable job distribution across six national Galaxy nodes (.* servers) and several PULSAR endpoints. The PULSAR project is growing, and ESG establishes new endpoints at partnersites in e.g. Czech Republic, Poland, Slovakia and Türkiye.

ELIXIR Projects

carried out for/on Galaxy:

    • 2018-galaxy, 2019-galaxy, 2021-galaxy
    • 2018-biocontainers, 2019-containers, 2021-containerservices
    • 2019-ETP1, 2022-ETP1, 2022-ETP3
    • 2021-hCNVexchange, 2021-hCNVbundles
    • 2021-ToolsEcosystem, 2021-IDPTools

.usegalaxy.eu

EuroScienceGateway was funded by the European Union programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963.

Grant agreement 101057388

2 of 2

Enabling an operational, open and FAIR EOSC ecosystem

Sebastian Schaaf1, Anika Erxleben-Eggenhofer1, Björn Grüning1

1Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany

Abstract

In many scientific domains data appear internationally distributed and (initially) incompatible. Thus, exploiting these data is still laborious, error-prone, and increasingly dependent on specialized technical skills and domain knowledge. FAIR practices are encouraged, however their adoption curve is steep.

For several of the described issues particular solutions exist, and some have developed to quite broad, generalized and powerful resources. The Galaxy platform for data integration and analysis has matured from computational biology to various scientific domains, providing users with an easy and powerful web interface to access data and databases, run tools and workflows, utilizing huge computational resources in the background. The European Open Science Cloud (EOSC) initiative’s primary goal is to facilitate FAIR use of research data in a trusted environment for researchers, businesses, and citizens. As ELIXIR, Galaxy and EOSC are committed to open science principles.

We present the Horizon Europe (HE) project EOSC EuroScienceGateway (ESG), started in 09/2022. 17 European partners target a tight interconnection of European Galaxy instances and improved FAIRification of data analysis by integrating metadata and knowledge related EOSC services (Zenodo, Workflowhub.eu). Notably, the three major components in ESG are the European Galaxy server (usegalaxy.eu), the distributed compute network PULSAR and the Galaxy Training Network (GTN); all three infrastructures underwent major advancements the recent years, to a substantial extent thanks to past and ongoing ELIXIR support. How components are connected, resulting in the creation of a robust, scalable and seamlessly integrated open infrastructure for data-driven research, is the story of our poster.

EuroScienceGateway was funded by the European Union programme Horizon Europe (HORIZON-INFRA-2021-EOSC-01-04) under grant agreement number 101057388 and by UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10038963.

Grant agreement 101057388