1 of 20

Homogenization of workflow metadata without restriction and user reeducation

Adrian Zimmer

Technische Universität Kaiserslautern

Email: adrian.zimmer@nfdi4plants.org

Data PLANT

2 of 20

DataPLANT

  • NFDI consortium of plant research (www.nfdi4plants.org)
  • Team of around 40 members
  • Goal: enable FAIR collaborative research in plant biology
  • FAIR: Findable, Accessible, Interoperable and Re-usable

2

knowledge

notes

digital information

3 of 20

Annotated Research Context (ARC)

3

experimental data

annotation

computation

4 of 20

Workflow FAIRness

  • Difficulty: Many different metadata standards and workflow systems
  • Restricting users to a single workflow system is not sustainable
  • A single point of entry is required to achieve homogenization

4

5 of 20

How can CWL homogenize workflows?

  • Pure CWL workflows are supported out of the box
  • But what about other systems like Galaxy or NextFlow?

5

Approach 1 �Write translators or CWL exporters for every workflow language or system

  • A lot of work
  • Hard to maintain
  • Cannot expect everyone to be interested or have the capacity
  • Many benefits are lost

CWL workflow

Galaxy to CWL�translator

Galaxy workflow

CWL workflow

NextFlow to CWL�translator

NextFlow workflow

Approach 2 �Use CWL to delegate the data to the workflow system in question

Input Data

CWL

Inputs

Pure CWL workflow

Galaxy delegation workflow

NextFlow delegation workflow

Parent CWL Workflow

Step 1

Step 2

Step 3

Outputs

6 of 20

Executing Galaxy workflows with CWL 

7 of 20

Be able to…

7

  1. Create and download a workflow
  1. Place the workflow file inside the ARC
  1. Execute the workflow on Galaxy server
  1. Retrieve resulting history and place it inside the ARC

(ARC)

workflow.ga

Galaxy history

8 of 20

1. Create and download a Galaxy workflow

8

9 of 20

Be able to…

9

  1. Create and download a workflow
  1. Place the workflow inside the ARC
  1. Execute the workflow on Galaxy server
  1. Retrieve resulting history and place it inside the ARC

(ARC)

workflow.ga

Galaxy history

10 of 20

3. + 4.: Planemo to the rescue!

10

workflow.ga

galaxyInput.yml

„run“

Galaxy history

11 of 20

The complete workflow

11

planemo-run

(CommandLineTool)

galaxy-workflow.cwl

galaxyInputParams

history

cwl-galaxy-parser

(CommandLineTool)

Input1:

  - class: File

    path: sampletxt1.txt

  - class: File

    path: sampletxt2.txt

Input1

run.yml (cwl job file)

workflow.ga

12 of 20

Be able to…

12

  1. Create and download a workflow
  1. Place the workflow inside the ARC
  1. Execute the workflow on Galaxy server
  1. Retrieve resulting history and place it inside the ARC

(ARC)

workflow.ga

Galaxy history

13 of 20

Using Galaxy workflow metadata to autogenerate the process

14 of 20

Using Galaxy workflow metadata to autogenerate the process

  • Galaxy workflow files (.ga) are plain JSON

🡪 Read inputs in the .ga file and generate the nececessary CWL files

14

{

    "a_galaxy_workflow": "true",

    "annotation": "",

    "format-version": "0.1",

    "name": "CWL-Galaxy Example",

    "steps": {

        "0": {

            "annotation": "",

            "content_id": null,

            "errors": null,

            "id": 0,

            "input_connections": {},

            "inputs": [

                {

                    "description": "",

                    "name": "Input1"

                }

            ],

            "label": "Input1",

            "name": "Input dataset collection",

            "outputs": [],

                        "tool_id": null,

            "tool_version": null,

            "type": "data_collection_input",

            "workflow_outputs": [

                {

                    "label": null,

                    "output_name": "output",

                    "uuid": "3d82ba3c-a83f-45fb-b652-dcd3f0817c7b"

                }

workflow.ga

15 of 20

cwl-ts-auto

  • New TypeScript + JavaScript library for serializing and deserializing CWL documents
  • Autogenerated using schema-salad TypeScript codegen and the CWL schema
  • Node and (basic) browser support
  • Check it out: https://github.com/common-workflow-lab/cwl-ts-auto
  • Available on NPM

15

16 of 20

galaxy-workflow-to-arc example

16

Input1:

  - class: File

  location: enter location

  - class: File

  location: enter location

run.yml

17 of 20

Executing the workflow using cwltool

  • Only requirements: cwltool and Docker

17

ARC_GALAXY_URL="https://usegalaxy.eu" ARC_GALAXY_API_KEY="<YOUR_API_KEY>" cwltool --preserve-environment ARC_GALAXY_URL --preserve-environment ARC_GALAXY_API_KEY --outdir runs/run1 runs/run1/run.cwl runs/run1/run.yml

18 of 20

Summary

All the user needs to do:

  1. Create and download the Galaxy workflow
  2. Use the galaxy-workflow-to-arc tool to generate the CWL files
  3. Place the result inside the ARC and specify inputs in the job file
  4. Start the workflow

18

19 of 20

Code repositories:

19

20 of 20

Thank You!