WP2 introduction
Stimulate FAIR and reusable research
WP2 leader: UNIMAN
0
This work is licensed under a �Creative Commons Attribution 4.0 International License.�Cite as: https://doi.org/10.5281/zenodo.7152762
2022-10-06 by Stian Soiland-Reyes, Carole Goble�EuroScienceGateway kickoff, Freiburg
Grant agreement 101057388
WP2 Objectives
O2.1 Bringing FAIR workflows into EOSC through the EuroScienceGateway
O2.2 Support reusable and reproducible workflows
O2.3 Establish FAIR Digital Objects as citable exchange format for workflows for all EOSC services
O2.4 Establish FAIR Workflow Digital Objects as publishable scholarly objects
1
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Motivation & background
2
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Annual reminder:�FAIR principles for sharing data
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
3
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
tl;dr: �- make data & metadata available
- use persistent identifiers!�- machine-readable metadata�- use standards
FAIR Tulip by Meznah Aloqalaa
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
FAIR for Research Software�(FAIR4RS)
Findable: Software, and its associated metadata, is easy to find for both humans and machines.
F1. Software is assigned a globally unique and persistent identifier
F1.1. Different components of the software are assigned distinct identifiers representing different levels of granularity
F1.2. Different versions of the same software are assigned distinct identifiers
F2. Software is described with rich metadata
F3. Metadata clearly and explicitly include the identifier of the software they describe
F4. Metadata are FAIR and are searchable and indexable
4
Accessible: Software, and its metadata, is retrievable via standardized protocols.
A1. Software is retrievable by its identifier using a standardized communications protocol
A2. Metadata are accessible, even when the software is no longer available
Interoperable: Software interoperates with other software through exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs), described through standards.
I1. Software reads, writes and exchanges data in a way that meets domain-relevant community standards
I2. Software includes qualified references to other objects
Reusable: Software is both usable (it can be executed) and reusable (it can be understood, modified, built upon, or incorporated into other software).
R1. Software is described with a plurality of accurate and relevant attributes
R1.1. Software is given a clear and accessible license
R1.2. Software is associated with detailed provenance
R2. Software includes qualified references to other software
R3. Software meets domain-relevant community standards
Software is different from data!
Evolves over time, complexity, dependencies, �source vs binary, open source communities
More challenging to attribute, cite, archive, version
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
FAIR Computational Workflows
Can workflows help with FAIR?
Reproducible and Reusable
Explain a computational method
Record detailed provenance of execution
FAIR+ASAP: �Automation, Scalable, Abstraction, Provenance
5
Challenges:
Workflows simplify (but don’t hide) Research Software
Fully capturing workflow as digital objects
Too many workflow systems! (>324) -> interoperability suffers
Requirements by workflow engine (e.g. cloud infrastructure)
Usually not cited – how?
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
FAIR Workflow Ecosystem: Hybrid Processual Objects
FAIR Method Object
FAIR Software Objects
FAIR Data
In and Out
FAIR Enabling Services
Carole Goble, Stuart Owen
Defragmentation training school for bioimaging workflows 2022-09-30
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
FAIR Workflow registry
Workflow-system agnostic
Search for and discover workflows
Metadata standardization �(CWL, schema.org, custom tags, RO-Crate)
DOI publication, citation & credit
Collections
Teams, Organizations and Communities
Programmatic access: GA4GH TRS API, RO-Crate
Registry, not repository
Workflows can live elsewhere, e.g. GitHub
Integration with execution platforms (incl. usegalaxy.eu)
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
EOSC-Life ecosystem
The services in the Workflow Collaboratory exchange digital objects as Workflow RO-Crates
Packaging workflow files & companion objects
Submission / download
Exchange between services & systems
Reproducibility & Testing
Citation
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Using standards for workflow execution provenance
Renske de Wit https://doi.org/10.5281/zenodo.7113250
Simone Leo, Laura Rodríguez-Navas, et al.
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Multiple platforms and repositories
Challenge: Digital Objects are deposited in different
repositories depending on their type and domain
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
11
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
FAIR Digital Object (FDO) – conceptual view
Predictable implementation of FAIR for active objects – not just static data
PID Profile
Collection
FDO
PID�20.301/a
Metadata
Operation
Operation
Operation
�Attributes
20.123: “Alice”�20.789: <http://...>
20.456: 10.1234/ab
PID Record
Bytes
Bytes
FDO
FDO
FDO Type
FDO and RO highlighted in �EOSC Interoperability Framework
https://doi.org/10.2777/620649
Note: FDO is conceptual model + specifications�Can be realized with different implementations:
FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units: https://doi.org/10.3390/publications8020021
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
FAIR RO-Crate as a realization of FDO Research Objects
FAIR Signposting – HTTP headers for PID, metadata, ++
Common types from schema.org
Multiple metadata files incl. JSON-LD, DataCite XML
Domain-specific profiles (e.g. Workflow Crate)
Conforming to existing Community APIs (e.g. GA4GH)
Implementing FDO with current web standards
Updating Linked Data practices for FAIR Digital Object principles
& Creating lightweight FAIR Digital Objects with RO-Crate. �Research Ideas and Outcomes 8:e93937 , 1st Intl Conf on FAIR Digital Objects
PID Profile:�RO-Crate
PID�https://doi.org/...
ro-crate-metadata.json
Operation
Operation
HTTP GET
�<…schema.org/dataset>; rel=type
<https://doi…>; rel=cite-as
<…crate.zip>; rel=item�<ro-crat…>; rel=describedby;� profile=…ro-crate
PID Record
zip
FDO Type:
Dataset
FAIR Signposting�https://signposting.org/FAIR/
...
FDO Type:
ComputationalWorkflow
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Structured self-describing, machine readable, metadata objects
RO-Crate Metadata file
Archive file format / packaging system
id
type
description
datePublished
…
license
author
organisation
https://github.com/o/script
files
links to web resources
RO-Crate Content
directories
id
type
description
datePublished
creator
size
format
…
Structured metadata about the RO-Crate and content
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
A little bit of packaging goes a long way
Familiar, developer friendly Lo-Tek – web native, off-the-shelf, machine and human readable, search engine accessible: PIDs + JSON-LD + Schema.org + BagIT/Zip/OCDL.
Infrastructure independent to overcome repository and service silos: Practical, lightweight, robust.
One size does not fit all – embrace diversity, legacy, unknowns – open-ended, multi-interpretation, self-describing. Extensible metadata + pre-existing ontologies: Duck type profiling.
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
WP2 Aims
Support FAIR practices for and by workflows.
Realize FAIR Digital Objects (FDOs) as RO-Crates
Exchanging, publishing, archiving and citing workflows and their companion data, provenance logs and associated resources
Publish FDOs in EOSC catalogues (e.g. OpenAIRE)
Mature WorkflowHub to TRL-9 status EOSC service
Promote WorkflowHub as registry of choice for all workflow system types and for all disciplines in EOSC
Reach out to computational researcher communities and publishers
Align with EOSC’s PID and metadata schema frameworks
…in collaboration with FAIR-IMPACT
16
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Task T2.1: Integration of EuroScienceGateway in EOSC
Integrate workflows in existing EOSC services (Findability, Accessibility)
Use WorkflowHub as workflow registry in ESG & EOSC �→ mature to TRL-9 EOSC service
← Working with WP5 for new user requirements
Catalogue of workflows adhering to best practices (MS3, MS4)
← ..from use cases (WP5) and for training (WP1)
WorkflowHub to integrate execution through EuroScienceGateway (T3.4)
Register in EOSC catalogues/aggregators (e.g. OpenAIRE)
M7-M36: UNIMAN, VIB, BSC, EGI
17
Goal: Establish EuroScienceGateway as new EOSC service
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Task T2.2: Reproducible and reusable FAIR Digital Objects
Workflows as first class objects in the �EOSC Interoperability Framework
Harvest metadata from Galaxy and other WMS
Mature RO-Crate Profiles: �Workflow Crate Profile. Workflow Run Profile.
ESG analytics over provenance �→ inform optimisation of meta-scheduling (WP4)
M1-M26: UNIMAN, VIB, EPFL, BSC
18
Formalize FAIR Digital Object profile using RO-Crate.
Collaborations: FAIR Digital Object Forum, RDA Data Fabric IG, EOSC Core
Collaborating with FAIR-IMPACT: �PID practices, metadata requirements for citing workflows (T2.4), FAIR Research Software
Goal: Mature FAIR Digital Object RO-Crate approach
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Task T2.3: Using and enriching workflow FDOs
Ensure deposition in long term archives (e.g. Zenodo, Software Heritage)
Use & contribute to EOSC scholarly metadata services �..and scholarly aggregation catalogues (e.g. OpenAIRE)
Include in scholarship Knowledge Graphs (KG) (e.g. OpenAIRE Research Graph, DataCite PID graph).
Make EuroScienceGateways metadata suitable for other EOSC services
M15-M36: UNIMAN, EPFL, UP, UiO
19
Figure by Roderic Page�https://doi.org/10.3897/rio.2.e8767�(CC BY 4.0)
Fenner, M., & Aryani, A. (2019). �Introducing the PID Graph. �https://doi.org/10.5438/JWVF-8A66�(CC BY 4.0)
Accumulate additional workflow metadata
Explore metadata extraction from publications, use Knowledge Graph services (MS5, MS6)
Provide guidance and workflow discovery for the user communities (WP5)
Inform EuroScienceGateway infrastructure decision making (WP4)
(appropriate workflow choice, Pulsar integration and meta-scheduling (T4.3).
Goal: Exchange workflow FDOs between EuroScienceGateway and EOSC, improve discovery by combining knowledge graphs
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Task T2.4: FAIR workflows as scholarly objects �in scientific publishing
Linking of workflows and their metadata to research publications
Supplement workflow metadata with information on publications
🡪 Extract such crosslinks from existing publications
Link publishing services used in research communities (WP5):
rapid communication services �(e.g. Astronomer’s Telegrams) 🡪 T5.2
preprint publishing services (e.g. arXiv.org)
public research output databases such as OpenAIRE
specialist journals for software publications �(e.g. Journal of Open Source Software, GigaByte)
traditional publishers and their services.
M1-M36: EPFL, VIB, UP, UNIMAN
20
Establish WorkflowHub as a registry authority
Encourage workflow citations (e.g. DOI to WorkflowHub, DockStore, Zenodo) in journal articles
Extend research software citation practices and initiatives (e.g. RDA FAIR4RS, Workflows Community Initiative)
Establish FAIR workflow recommendations
Establish peer review assessment of workflows
Collaborate w/ FAIR-IMPACT (INFRA-2021-EOSC-01-05)
Goal: Incorporate workflow FDOs into the �scholarly communication landscape as �first class publishable objects
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
WP2 partners and allocation
21
Partner | Effort | Personnel [TBD] | Responsibilities and contributions |
UNIMAN | 23 PM | Stian, Finn, ++ | WP2 leader, WorkflowHub, FDO, RO-Crate, FAIR workflows PIDs |
EPFL | 18 PM | Gabriele, Andrii, Volodymyr | Link to publications (T4.4), use-case-tester, astronomy perspective, external view � |
VIB | 11 PM | Ignacio, Paul | Galaxy provenance, connecting with WP4 |
BSC | 8 PM | JM Fernández, Maria, ++ | Main WP3 link, workflow engines, GA4GH. WP4 link to other projects (Spanish IMPaCT-Data, not related to ELIXIR IMPACT). Workflow scheduling improvements. |
EGI | 5 PM | Catalin, Gwen | Main EOSC link, EOSC-Portal, EOSC-Future, EOSC catalogue. EGI leading WP4. Perhaps Gwen on comms. |
UiO | 2 PM | Sveinung, ++ | FAIR data aspects more than workflows (but also data workflows). Also connected to Galaxy community, GA4GH, and RDA. �Can also be a link to ELIXIR Norway efforts on workflows (non-sensitive mainly in Galaxy, sensitive probably not) |
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Deliverables and milestones
D2.1 Reproducible FAIR Digital Objects for workflows�(lead: UNIMAN) M24
Report: FAIR Digital Objects (FDOs), for exchanging, publishing, archiving and citing workflows and their companion data, provenance logs and associated resources, will be realized as RO-Crates.
D2.2 Publishing workflow enriched FDOs to EOSC�(lead: UNIMAN) M30
Report: Publishing workflow enriched FDOs to EOSC
22
MS3 Initial EuroScienceGateway workflows registered�(lead: VIB/UNIMAN) M18
Collection in WorkflowHub
MS4 EuroScienceGateway workflows registered as FDOs�(lead: VIB/UNIMAN) M36
Zenodo Data Deposit
MS5 Initial EuroScienceGateway knowledge graph�(lead: VIB/UNIMAN) M24
Zenodo Data Deposit
MS6 Integrated EuroScienceGateway knowledge graph�(lead: VIB/UNIMAN) M36
Zenodo Data Deposit; Report of integrations and queries
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Existing and potential collaborations
Link to the other projects
…with other EOSC projects and initiatives
Previous links with Galaxy
23
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Engagement with other work packages
WP1
WP3
WP4
24
WP5
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble
Open questions
For discussions
25
2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble