1 of 24

Exchanging workflow provenance �as FAIR Digital Objects�using RO-Crate

European Galaxy Days 2023�Freiburg, Germany

2023-10--05

Stian Soiland-Reyes

The University of Manchester�RO-Crate co-lead

soiland-reyes@manchester.ac.uk�https://orcid.org/0000-0001-9842-9718

2 of 24

Describe and package data collections, datasets, software etc. with their metadata

Platform-independent object exchange between repositories and services

Support reproducibility and analysis: link data with codes and workflows

Transfer of sensitive/large distributed datasets with persistent identifiers

Aggregate citations and persistent identifiers

Propagate provenance and existing metadata

Publish and archive mixed objects and references

Reuse existing standards, but hide their complexity

Aims of FAIR Research Objects

3 of 24

Realizing FAIR Digital Objects with RO-Crate

3

RO-Crate Metadata file

id

type

description

datePublished

license

author

organisation

https://github.com/o/script

files

By reference (PID, URL)

RO-Crate Content

directories

id

type

description

datePublished

creator

size

format

Structured metadata about the RO-Crate and content

Reference existing repositories

Re-use Web standards (JSON-LD, schema.org)�Persistent identifiers w/FAIR Signposting

Add context: people, projects, etc.

4 of 24

Techie deep-dive!

Warning: JSON ahead

4

5 of 24

6 of 24

RO-Crate Metadata File

{ "@id": "cp7glop.ai",

"@type": "File",� "name": "Diagram showing trend to increase",

},

{ "@type": "CreativeWork",

"@id": "ro-crate-metadata.json",

"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},

"about": { "@id": "./" }

}

{ "@context": "https://w3id.org/ro/crate/1.1/context",

"@graph": [

RO-Crate metadata file descriptor

RO-Crate root dataset

..collection of Data entities

..described w/ contextual entities

{ "@id": "./",

"identifier": "https://doi.org/10.5281/zenodo.1009240",

"@type": "Dataset",�

"hasPart": [

{ "@id": "cp7glop.ai" },

{ "@id": "lots_of_little_files/" },

{ "@id": "communities-2018.csv" },

{ "@id": "https://doi.org/10.4225/59/59672c09f4a4b" },

{ "@id": "SciDataCon Presentations/AAA_Pilot_Project_Abstract.html" }

],�

"author": { "@id": "https://orcid.org/0000-0002-8367-6908" },

"publisher": { "@id": "https://ror.org/03f0f6041" },

"citation": { "@id": "https://doi.org/10.1109/TCYB.2014.2386282"},

"name": "Presentation of user survey 2018"

},

Flat list of metadata per entity

JSON-LD preamble

"hasPart": [

{ "@id": "cp7glop.ai" },

{ "@id": "lots_of_little_files/" },

{ "@id": "communities-2018.csv" },

{ "@id": "https://doi.org/10.4225/59/59672c09f4a4b" },

{ "@id": "SciDataCon-Presentations/AAA_Pilot_Abstract.html"}

],

7 of 24

Metadata

Data and Contextual entities �described within RO-Crate Metadata File

Base vocabulary & types: schema.org �

Cross-references to further contextual entities

RO-Crate principle:

Reuse existing PIDs and URLs

� ..but always describe entities which lack a � human-readable resolution

{

"@id": "https://orcid.org/0000-0002-8367-6908",

"@type": "Person",

"affiliation": { "@id": "https://ror.org/03f0f6041" },

"name": "J. Xuan"

}

{

"@id": "https://ror.org/03f0f6041",

"@type": "Organization",

"name": "University of Technology Sydney",

"url": "https://www.uts.edu.au/"

}

{

"@id": "figure.png",

"@type": ["File", "ImageObject"],

"name": "XXL-CT-scan of an XXL Tyrannosaurus rex skull",

"identifier": "https://doi.org/10.5281/zenodo.3479743",

"author": {"@id": "https://orcid.org/0000-0002-8367-6908"},

"encodingFormat": "image/png"

}

Metadata file describe each object (briefly)

8 of 24

Using common vocabularies

.. extending only when needed

9 of 24

10 of 24

11 of 24

RO-Crate�in practice

RO-Crate is used by multiple international projects

Applied across research domains –

from life sciences to cultural heritage

https://www.researchobject.org/ro-crate/in-use/

12 of 24

Collecting corpora for a �Language Data Commons

13 of 24

Adding rich metadata to existing data platforms�

The CS3MESH4EOSC project combines major data services into the federated ScienceMesh

Users can collaborate across established data repositories and data science services.

FAIR Description Service (based on Describo Online) to annotate data using RO-Crate

Domain-specific profiles for additional metadata requirements

14 of 24

Building an �EOSC ecosystem of�FAIR Workflows

14

  • EOSC projects BY-COVID, EOSC-Life, EuroScienceGateway, BioDT exchange rich �Workflow RO-Crates within an emerging �EOSC ecosystem of workflow services
  • Workflow Crates transfer
    • identifiers, authors, license, workflow system
    • executable workflows in their native format (e.g. Galaxy)
    • interoperable CWL description of the workflow
    • software citations (e.g. tools used)
    • required data sources
    • test suites
    • workflow execution provenance

15 of 24

Provenance traces of �computational executions

“Just enough” provenance model using schema.org Actions:

input1.txt is the object Alice used to create result1.txt with instrument matlab

Provenance chain of connected actions�→ implicit workflow: can it be automated?

Layered profiles:

  1. Process Run Crate – some tool was executed
  2. Workflow Run Crate – the tool was a workflow
  3. Provenance Run Crate – we know which tools the workflow ran

16 of 24

Five Safes RO-Crate Profile

User making the request

Data Access agreement

Pre-approved workflows and containers on TRE

Metadata of dataset and references to dataset (e.g. HDR-UK Gateway)

Outputs approved for release & provenance

Secure workflow execution �in federated Trusted Research Environments

17 of 24

PID Profile

Collection

FDO

PID�20.301/a

Metadata

Operation

Operation

Operation

Attributes

20.123: “Alice” 20.789: <http://...>

20.456: 10.1234/ab

PID Record

Bytes

Bytes

FDO

FDO

FDO Type

FAIR Digital Object (FDO) – conceptual view

Rigid Persistent Identifiers

Self-describing digital objects

Distributed architecture

Machine actionable

Encapsulation of operations

    • CRUD
    • Extensible operations

Data/metadata abstraction

    • Several types of metadata
  • Predictable implementation of FAIR �for active objects, not just static data

18 of 24

Resolving RO-Crate FDOs using FAIR Signposting

Profile:�RO-Crate

w3id

�<…schema.org/dataset>; rel=type

<https://doi…>; rel=cite-as

<…crate.zip>; rel=item�<ro-crat…>; rel=describedby;� profile=…ro-crate

PID Record

zip

Type

Dataset

FAIR Signposting

...

Type

ComputationalWorkflow

Metadata FDO

ro-crate-metadata.json

PIDhttps://doi.org/...

w3id

RO-Crate profile

HEAD https://workflowhub.eu/workflows/255?version=1 HTTP/1.1

Link: <https://workflowhub.eu/workflows/255?version=1> ;� rel="describedby"; � type="application/vnd.datacite.datacite+xml", <https://workflowhub.eu/workflows/255?version=1> ; � rel="describedby" ; type="application/ld+json", �<https://doi.org/10.48546/workflowhub.workflow.255.1> ; � rel="cite-as", �<https://workflowhub.eu/workflows/255/ro_crate?version=1> ; � rel="item" ; type="application/zip" ;� profile="https://w3id.org/ro/crate"

HTTP Link headers for machines

HTML landing page for humans

19 of 24

RO-Crate in Galaxy�

20 of 24

21 of 24

(rocrate) stain@xena:~/Downloads/41$ �runcrate report .

action: #d1131123-46a9-4b08-94f5-f57166758c62

instrument: workflows/a899d403c7447c52.gxwf.yml (['File', 'SoftwareSourceCode', 'ComputationalWorkflow'])

started: 2021-11-18T01:35:41.811075

ended: 2021-11-18T01:35:41.811085

inputs:

datasets/Pasted_Entry_4.txt <- #3ec11a41-a7b1-46fc-82c2-69d4d71d5298

outputs:

datasets/tac_on_data_4_7.txt <- #341540ad-2cc0-42a1-98f2-e845926ff184

datasets/Select_first_on_data_7_8.txt <- #5e2c955c-a96c-49ba-a77f-91f06e1e8452

(rocrate) stain@xena:~/Downloads/41$ �rochtml ro-crate-metadata.json

22 of 24

RO-Crate training

23 of 24

More �RO-Crate in Galaxy

Import history

Export to InvenioRDM

Import as data set

24 of 24

Thank you!

24