1 of 26

WP2 introduction

Stimulate FAIR and reusable research

WP2 leader: UNIMAN

0

2022-10-06 by Stian Soiland-Reyes, Carole Goble�EuroScienceGateway kickoff, Freiburg

Grant agreement 101057388

2 of 26

WP2 Objectives

O2.1 Bringing FAIR workflows into EOSC through the EuroScienceGateway

O2.2 Support reusable and reproducible workflows

O2.3 Establish FAIR Digital Objects as citable exchange format for workflows for all EOSC services

O2.4 Establish FAIR Workflow Digital Objects as publishable scholarly objects

1

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

3 of 26

Motivation & background

  1. How to implement FAIR principles for software and workflows?
  2. WorkflowHub and the EOSC-Life ecosystem
  3. Research Objects
  4. FAIR Digital Objects (FDO)
  5. RO-Crate

2

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

4 of 26

Annual reminder:�FAIR principles for sharing data

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

 

3

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

 

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

R1.1. (meta)data are released with a clear and accessible data usage license

R1.2. (meta)data are associated with detailed provenance

R1.3. (meta)data meet domain-relevant community standards

tl;dr: �- make data & metadata available

- use persistent identifiers!�- machine-readable metadata�- use standards

FAIR Tulip by Meznah Aloqalaa

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

5 of 26

FAIR for Research Software�(FAIR4RS)

Findable: Software, and its associated metadata, is easy to find for both humans and machines.

F1. Software is assigned a globally unique and persistent identifier

F1.1. Different components of the software are assigned distinct identifiers representing different levels of granularity

F1.2. Different versions of the same software are assigned distinct identifiers

F2. Software is described with rich metadata

F3. Metadata clearly and explicitly include the identifier of the software they describe

F4. Metadata are FAIR and are searchable and indexable

 

4

Accessible: Software, and its metadata, is retrievable via standardized protocols.

A1. Software is retrievable by its identifier using a standardized communications protocol

  • A1.1. The protocol is open, free, and universally implementable
  • A1.2. The protocol allows for an authentication and authorization procedure, where necessary

A2. Metadata are accessible, even when the software is no longer available

Interoperable: Software interoperates with other software through exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs), described through standards.

I1. Software reads, writes and exchanges data in a way that meets domain-relevant community standards

I2. Software includes qualified references to other objects

Reusable: Software is both usable (it can be executed) and reusable (it can be understood, modified, built upon, or incorporated into other software).

R1. Software is described with a plurality of accurate and relevant attributes

R1.1. Software is given a clear and accessible license

R1.2. Software is associated with detailed provenance

R2. Software includes qualified references to other software

R3. Software meets domain-relevant community standards

Software is different from data!

Evolves over time, complexity, dependencies, �source vs binary, open source communities

More challenging to attribute, cite, archive, version

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

6 of 26

FAIR Computational Workflows

Can workflows help with FAIR?

Reproducible and Reusable

Explain a computational method

Record detailed provenance of execution

FAIR+ASAP: �Automation, Scalable, Abstraction, Provenance

5

Challenges:

Workflows simplify (but don’t hide) Research Software

Fully capturing workflow as digital objects

Too many workflow systems! (>324) -> interoperability suffers

Requirements by workflow engine (e.g. cloud infrastructure)

Usually not cited – how?

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

7 of 26

FAIR Workflow Ecosystem: Hybrid Processual Objects

FAIR Method Object

FAIR Software Objects

FAIR Data

In and Out

FAIR Enabling Services

Carole Goble, Stuart Owen

Defragmentation training school for bioimaging workflows 2022-09-30

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

8 of 26

FAIR Workflow registry

Workflow-system agnostic

Search for and discover workflows

Metadata standardization �(CWL, schema.org, custom tags, RO-Crate)

DOI publication, citation & credit

Collections

Teams, Organizations and Communities

Programmatic access: GA4GH TRS API, RO-Crate

Registry, not repository

Workflows can live elsewhere, e.g. GitHub

Integration with execution platforms (incl. usegalaxy.eu)

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

9 of 26

EOSC-Life ecosystem

The services in the Workflow Collaboratory exchange digital objects as Workflow RO-Crates

Packaging workflow files & companion objects

Submission / download

Exchange between services & systems

Reproducibility & Testing

Citation

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

10 of 26

Using standards for workflow execution provenance

Simone Leo, Laura Rodríguez-Navas, et al.

https://www.researchobject.org/workflow-run-crate/

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

11 of 26

Multiple platforms and repositories

Challenge: Digital Objects are deposited in different

repositories depending on their type and domain

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

12 of 26

11

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

13 of 26

FAIR Digital Object (FDO) – conceptual view

Predictable implementation of FAIR for active objects – not just static data

PID Profile

Collection

FDO

PID�20.301/a

Metadata

Operation

Operation

Operation

�Attributes

20.123: “Alice”�20.789: <http://...>

20.456: 10.1234/ab

PID Record

Bytes

Bytes

FDO

FDO

FDO Type

  • Distributed object architecture
  • Self-describing digital objects
  • Several types of metadata
  • Encapsulation of operations
  • Machine-actionable

FDO and RO highlighted in �EOSC Interoperability Framework

https://doi.org/10.2777/620649

Note: FDO is conceptual model + specifications�Can be realized with different implementations:

  • Handle + DOIPv2
  • URL + Linked Data Platform
  • DOI + RO-Crate

FAIR Digital Objects for Science: From Data Pieces to Actionable Knowledge Units: https://doi.org/10.3390/publications8020021 

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

14 of 26

FAIR RO-Crate as a realization of FDO Research Objects

FAIR Signposting – HTTP headers for PID, metadata, ++

Common types from schema.org

Multiple metadata files incl. JSON-LD, DataCite XML

Domain-specific profiles (e.g. Workflow Crate)

Conforming to existing Community APIs (e.g. GA4GH)

Implementing FDO with current web standards

PID Profile:�RO-Crate

PIDhttps://doi.org/...

ro-crate-metadata.json

Operation

Operation

HTTP GET

<…schema.org/dataset>; rel=type

<https://doi…>; rel=cite-as

<…crate.zip>; rel=item�<ro-crat…>; rel=describedby;� profile=…ro-crate

PID Record

zip

FDO Type:

Dataset

FAIR Signposting�https://signposting.org/FAIR/

...

FDO Type:

ComputationalWorkflow

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

15 of 26

Structured self-describing, machine readable, metadata objects

RO-Crate Metadata file

Archive file format / packaging system

id

type

description

datePublished

license

author

organisation

https://github.com/o/script

files

links to web resources

RO-Crate Content

directories

id

type

description

datePublished

creator

size

format

Structured metadata about the RO-Crate and content

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

16 of 26

A little bit of packaging goes a long way

Familiar, developer friendly Lo-Tek web native, off-the-shelf, machine and human readable, search engine accessible: PIDs + JSON-LD + Schema.org + BagIT/Zip/OCDL.

Infrastructure independent to overcome repository and service silos: Practical, lightweight, robust.

One size does not fit all – embrace diversity, legacy, unknowns – open-ended, multi-interpretation, self-describing. Extensible metadata + pre-existing ontologies: Duck type profiling.

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

17 of 26

WP2 Aims

Support FAIR practices for and by workflows.

Realize FAIR Digital Objects (FDOs) as RO-Crates

Exchanging, publishing, archiving and citing workflows and their companion data, provenance logs and associated resources

Publish FDOs in EOSC catalogues (e.g. OpenAIRE)

Mature WorkflowHub to TRL-9 status EOSC service

Promote WorkflowHub as registry of choice for all workflow system types and for all disciplines in EOSC

Reach out to computational researcher communities and publishers

Align with EOSC’s PID and metadata schema frameworks

…in collaboration with FAIR-IMPACT

16

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

18 of 26

Task T2.1: Integration of EuroScienceGateway in EOSC

Integrate workflows in existing EOSC services (Findability, Accessibility)

Use WorkflowHub as workflow registry in ESG & EOSC �→ mature to TRL-9 EOSC service

← Working with WP5 for new user requirements

Catalogue of workflows adhering to best practices (MS3, MS4)

← ..from use cases (WP5) and for training (WP1)

WorkflowHub to integrate execution through EuroScienceGateway (T3.4)

Register in EOSC catalogues/aggregators (e.g. OpenAIRE)

M7-M36: UNIMAN, VIB, BSC, EGI

17

Goal: Establish EuroScienceGateway as new EOSC service

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

19 of 26

Task T2.2: Reproducible and reusable FAIR Digital Objects

Workflows as first class objects in the �EOSC Interoperability Framework

Harvest metadata from Galaxy and other WMS

Mature RO-Crate Profiles: �Workflow Crate Profile. Workflow Run Profile.

ESG analytics over provenance → inform optimisation of meta-scheduling (WP4)

M1-M26: UNIMAN, VIB, EPFL, BSC

18

Formalize FAIR Digital Object profile using RO-Crate.

Collaborations: FAIR Digital Object Forum, RDA Data Fabric IG, EOSC Core

Collaborating with FAIR-IMPACT: PID practices, metadata requirements for citing workflows (T2.4), FAIR Research Software

Goal: Mature FAIR Digital Object RO-Crate approach

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

20 of 26

Task T2.3: Using and enriching workflow FDOs

Ensure deposition in long term archives (e.g. Zenodo, Software Heritage)

Use & contribute to EOSC scholarly metadata services �..and scholarly aggregation catalogues (e.g. OpenAIRE)

Include in scholarship Knowledge Graphs (KG) (e.g. OpenAIRE Research Graph, DataCite PID graph).

Make EuroScienceGateways metadata suitable for other EOSC services

M15-M36: UNIMAN, EPFL, UP, UiO

19

Fenner, M., & Aryani, A. (2019). �Introducing the PID Graph. �https://doi.org/10.5438/JWVF-8A66�(CC BY 4.0)

Accumulate additional workflow metadata

Explore metadata extraction from publications, use Knowledge Graph services (MS5, MS6)

Provide guidance and workflow discovery for the user communities (WP5)

Inform EuroScienceGateway infrastructure decision making (WP4)

(appropriate workflow choice, Pulsar integration and meta-scheduling (T4.3).

Goal: Exchange workflow FDOs between EuroScienceGateway and EOSC, improve discovery by combining knowledge graphs

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

21 of 26

Task T2.4: FAIR workflows as scholarly objects �in scientific publishing

Linking of workflows and their metadata to research publications

Supplement workflow metadata with information on publications

🡪 Extract such crosslinks from existing publications

Link publishing services used in research communities (WP5):

rapid communication services(e.g. Astronomer’s Telegrams) 🡪 T5.2

preprint publishing services (e.g. arXiv.org)

public research output databases such as OpenAIRE

specialist journals for software publications �(e.g. Journal of Open Source Software, GigaByte)

traditional publishers and their services.

M1-M36: EPFL, VIB, UP, UNIMAN

20

Establish WorkflowHub as a registry authority

Encourage workflow citations (e.g. DOI to WorkflowHub, DockStore, Zenodo) in journal articles

Extend research software citation practices and initiatives (e.g. RDA FAIR4RS, Workflows Community Initiative)

Establish FAIR workflow recommendations

Establish peer review assessment of workflows

Collaborate w/ FAIR-IMPACT (INFRA-2021-EOSC-01-05)

Goal: Incorporate workflow FDOs into the �scholarly communication landscape as �first class publishable objects

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

22 of 26

WP2 partners and allocation

21

Partner

Effort

Personnel [TBD]

Responsibilities and contributions

UNIMAN

23 PM

Stian, Finn, ++

WP2 leader, WorkflowHub, FDO, RO-Crate, FAIR workflows PIDs

EPFL

18 PM

Gabriele, Andrii, Volodymyr

Link to publications (T4.4), use-case-tester, astronomy perspective, external view

VIB

11 PM

Ignacio, Paul

Galaxy provenance, connecting with WP4

BSC

8 PM

JM Fernández, Maria, ++

Main WP3 link, workflow engines, GA4GH. WP4 link to other projects (Spanish IMPaCT-Data, not related to ELIXIR IMPACT). Workflow scheduling improvements. 

EGI

5 PM

Catalin, Gwen

Main EOSC link, EOSC-Portal, EOSC-Future, EOSC catalogue. EGI leading WP4. Perhaps Gwen on comms.

UiO

2 PM

Sveinung, ++

FAIR data aspects more than workflows (but also data workflows). Also connected to Galaxy community, GA4GH, and RDA. �Can also be a link to ELIXIR Norway efforts on workflows (non-sensitive mainly in Galaxy, sensitive probably not)

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

23 of 26

Deliverables and milestones

D2.1 Reproducible FAIR Digital Objects for workflows�(lead: UNIMAN) M24

Report: FAIR Digital Objects (FDOs), for exchanging, publishing, archiving and citing workflows and their companion data, provenance logs and associated resources, will be realized as RO-Crates.

D2.2 Publishing workflow enriched FDOs to EOSC�(lead: UNIMAN) M30

Report: Publishing workflow enriched FDOs to EOSC

22

MS3 Initial EuroScienceGateway workflows registered�(lead: VIB/UNIMAN) M18

Collection in WorkflowHub

MS4 EuroScienceGateway workflows registered as FDOs�(lead: VIB/UNIMAN) M36

Zenodo Data Deposit

MS5 Initial EuroScienceGateway knowledge graph�(lead: VIB/UNIMAN) M24

Zenodo Data Deposit

MS6 Integrated EuroScienceGateway knowledge graph�(lead: VIB/UNIMAN) M36

Zenodo Data Deposit; Report of integrations and queries

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

24 of 26

Existing and potential collaborations

Link to the other projects

…with other EOSC projects and initiatives

Previous links with Galaxy

23

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

25 of 26

Engagement with other work packages

WP1

  • Sustaining WorkflowHub as TRL-9 EOSC service (T1.4)
  • Training for FAIR Workflow FDOs (T1.6)

WP3

  • GA4GH APIs as Workflow FDO operations (T3.2)
  • Workflow FDO RO-Crate in TRS API (T3.2)
  • WfExS RO-Crate/provenance support (T3.4)

WP4

  • Workflow provenance to inform smart scheduling (T4.3)

24

WP5

  • Biodiversity 🡪 FDO work with BioDT, DiSSCO, Synthesys+ (T5.1)
  • Climate Science 🡪 RO-Crate work in RELIANCE project (T5.1)
  • Materials Science 🡪 FDO work with NIST on Material Schema (T5.2)
  • Astrophysics 🡪 Workflows in Astronomer’s Telegrams, Galaxy in SKA, early Research Object adopters (Wf4Ever), (T5.3)
  • Outreach to workflow communities (T5.4)

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble

26 of 26

Open questions

  1. How should we collect “just enough” metadata at different times? (workflow design, workflow run, workflow publishing)
  2. Where to publish Workflow Run FDOs with potentially large or sensitive data ? (WP4)
  3. Can WorkflowHub initiate Pulsar execution of non-Galaxy workflows, e.g. CWL, Nextflow? (T3.4)
  4. How should we handle job submission with mixture of data, some of which may not leave a particular node? (T4.2)
  5. How do we give users workflow discoverability aspects from enriched knowledge graphs? (WP5)
  6. … your questions!

For discussions

25

2022-10-06 EuroScienceGateway WP2 | Stian Soiland-Reyes, Carole Goble