Exchanging workflow provenance �as FAIR Digital Objects�using RO-Crate
European Galaxy Days 2023�Freiburg, Germany
2023-10--05
This work is licensed under a �Creative Commons Attribution 4.0 International License.
Stian Soiland-Reyes
The University of Manchester�RO-Crate co-lead
soiland-reyes@manchester.ac.uk�https://orcid.org/0000-0001-9842-9718
Describe and package data collections, datasets, software etc. with their metadata
Platform-independent object exchange between repositories and services
Support reproducibility and analysis: link data with codes and workflows
Transfer of sensitive/large distributed datasets with persistent identifiers
Aggregate citations and persistent identifiers
Propagate provenance and existing metadata
Publish and archive mixed objects and references
Reuse existing standards, but hide their complexity
Aims of FAIR Research Objects
Realizing FAIR Digital Objects with RO-Crate
3
RO-Crate Metadata file
id
type
description
datePublished
…
license
author
organisation
https://github.com/o/script
files
By reference (PID, URL)
RO-Crate Content
directories
id
type
description
datePublished
creator
size
format
…
Structured metadata about the RO-Crate and content
Reference existing repositories
Re-use Web standards (JSON-LD, schema.org)�Persistent identifiers w/FAIR Signposting
Add context: people, projects, etc.
Techie deep-dive!
Warning: JSON ahead
4
RO-Crate Metadata File
{ "@id": "cp7glop.ai",
"@type": "File",� "name": "Diagram showing trend to increase",
…
},
…
{ "@type": "CreativeWork",
"@id": "ro-crate-metadata.json",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
"about": { "@id": "./" }
}
{ "@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
RO-Crate metadata file descriptor
RO-Crate root dataset
..collection of Data entities
..described w/ contextual entities
{ "@id": "./",
"identifier": "https://doi.org/10.5281/zenodo.1009240",
"@type": "Dataset",�
"hasPart": [
{ "@id": "cp7glop.ai" },
{ "@id": "lots_of_little_files/" },
{ "@id": "communities-2018.csv" },
{ "@id": "https://doi.org/10.4225/59/59672c09f4a4b" },
{ "@id": "SciDataCon Presentations/AAA_Pilot_Project_Abstract.html" }
],�
"author": { "@id": "https://orcid.org/0000-0002-8367-6908" },
"publisher": { "@id": "https://ror.org/03f0f6041" },
"citation": { "@id": "https://doi.org/10.1109/TCYB.2014.2386282"},
"name": "Presentation of user survey 2018"
},
Flat list of metadata per entity
JSON-LD preamble
"hasPart": [
{ "@id": "cp7glop.ai" },
{ "@id": "lots_of_little_files/" },
{ "@id": "communities-2018.csv" },
{ "@id": "https://doi.org/10.4225/59/59672c09f4a4b" },
{ "@id": "SciDataCon-Presentations/AAA_Pilot_Abstract.html"}
],
Metadata
Data and Contextual entities �described within RO-Crate Metadata File
Base vocabulary & types: schema.org �
Cross-references to further contextual entities
RO-Crate principle:
Reuse existing PIDs and URLs
� ..but always describe entities which lack a � human-readable resolution
{
"@id": "https://orcid.org/0000-0002-8367-6908",
"@type": "Person",
"affiliation": { "@id": "https://ror.org/03f0f6041" },
"name": "J. Xuan"
}
{
"@id": "https://ror.org/03f0f6041",
"@type": "Organization",
"name": "University of Technology Sydney",
"url": "https://www.uts.edu.au/"
}
{
"@id": "figure.png",
"@type": ["File", "ImageObject"],
"name": "XXL-CT-scan of an XXL Tyrannosaurus rex skull",
"identifier": "https://doi.org/10.5281/zenodo.3479743",
"author": {"@id": "https://orcid.org/0000-0002-8367-6908"},
"encodingFormat": "image/png"
}
Metadata file describe each object (briefly)
Using common vocabularies
.. extending only when needed
RO-Crate�in practice
RO-Crate is used by multiple international projects |
Applied across research domains – from life sciences to cultural heritage |
|
https://www.researchobject.org/ro-crate/in-use/
Collecting corpora for a �Language Data Commons
Adding rich metadata to existing data platforms�
https://doi.org/10.5281/zenodo.7310739
https://doi.org/10.3897/rio.8.e95972
https://arkisto-platform.github.io/tools/description/describo-online/
The CS3MESH4EOSC project combines major data services into the federated ScienceMesh
Users can collaborate across established data repositories and data science services.
FAIR Description Service (based on Describo Online) to annotate data using RO-Crate
Domain-specific profiles for additional metadata requirements
Building an �EOSC ecosystem of�FAIR Workflows
14
Provenance traces of �computational executions
“Just enough” provenance model using schema.org Actions:
input1.txt is the object Alice used to create result1.txt with instrument matlab
Provenance chain of connected actions�→ implicit workflow: can it be automated?
Layered profiles:
Five Safes RO-Crate Profile
User making the request
Data Access agreement
Pre-approved workflows and containers on TRE
Metadata of dataset and references to dataset (e.g. HDR-UK Gateway)
Outputs approved for release & provenance
Secure workflow execution �in federated Trusted Research Environments
PID Profile
Collection
FDO
PID�20.301/a
Metadata
Operation
Operation
Operation
�Attributes
20.123: “Alice” 20.789: <http://...>
20.456: 10.1234/ab
PID Record
Bytes
Bytes
FDO
FDO
FDO Type
FAIR Digital Object (FDO) – conceptual view
Rigid Persistent Identifiers
Self-describing digital objects
Distributed architecture
Machine actionable
Encapsulation of operations
Data/metadata abstraction
Resolving RO-Crate FDOs using FAIR Signposting
Profile:�RO-Crate
w3id
�<…schema.org/dataset>; rel=type
<https://doi…>; rel=cite-as
<…crate.zip>; rel=item�<ro-crat…>; rel=describedby;� profile=…ro-crate
PID Record
zip
Type
Dataset
FAIR Signposting
...
Type
ComputationalWorkflow
Metadata FDO
ro-crate-metadata.json
PID�https://doi.org/...
w3id
RO-Crate profile
HEAD https://workflowhub.eu/workflows/255?version=1 HTTP/1.1
…
Link: <https://workflowhub.eu/workflows/255?version=1> ;� rel="describedby"; � type="application/vnd.datacite.datacite+xml", �<https://workflowhub.eu/workflows/255?version=1> ; � rel="describedby" ; type="application/ld+json", �<https://doi.org/10.48546/workflowhub.workflow.255.1> ; � rel="cite-as", �<https://workflowhub.eu/workflows/255/ro_crate?version=1> ; � rel="item" ; type="application/zip" ;� profile="https://w3id.org/ro/crate"
HTTP Link headers for machines
HTML landing page for humans
RO-Crate in Galaxy�
(rocrate) stain@xena:~/Downloads/41$ �runcrate report .
action: #d1131123-46a9-4b08-94f5-f57166758c62
instrument: workflows/a899d403c7447c52.gxwf.yml (['File', 'SoftwareSourceCode', 'ComputationalWorkflow'])
started: 2021-11-18T01:35:41.811075
ended: 2021-11-18T01:35:41.811085
inputs:
datasets/Pasted_Entry_4.txt <- #3ec11a41-a7b1-46fc-82c2-69d4d71d5298
outputs:
datasets/tac_on_data_4_7.txt <- #341540ad-2cc0-42a1-98f2-e845926ff184
datasets/Select_first_on_data_7_8.txt <- #5e2c955c-a96c-49ba-a77f-91f06e1e8452
(rocrate) stain@xena:~/Downloads/41$ �rochtml ro-crate-metadata.json
RO-Crate training
More �RO-Crate in Galaxy
Import history
Export to InvenioRDM
Import as data set
Thank you!
24