1 of 25

Anil S Thanki

Senior Bioinformatician, Gene Expression Team

EMBL-EBI, UK

Galaxy Workflow Executor

2 of 25

Co-authors

Pablo Moreno

Associate Director

Early Computational Oncology

AstraZeneca

Jonathan Manning

  • Senior Bioinformatician
  • DeepLife

Suhaib Mohammed

  • Lead Cloud Solution Architect
  • Healthcare & Life Science
  • Microsoft

3 of 25

Introduction

      • Generally Galaxy workflows are invoked from the user interface
      • In some scenarios, it's useful to invoke Galaxy workflows from command line as well.
      • To be able to do both helps collaboration between bioinformaticians and non-bioinformaticians and allows for versatile usage, such as in training exercises and production.

Workflow

CLI

Workflow

4 of 25

Introduction

      • Generally Galaxy workflows are invoked from the user interface
      • In some scenarios, it's useful to invoke Galaxy workflows from command line as well.
      • To be able to do both helps collaboration between bioinformaticians and non-bioinformaticians and allows for versatile usage, such as in training exercises and production.

Workflow

Production

Training

CLI

5 of 25

Introduction

      • To facilitate this set up of workflow deployment, we developed the ‘galaxy-workflow-executor’ as a Python package and CLI.
      • This package is open-source, conda and pip installable
      • It interacts with the Galaxy REST API through the BioBlend library.

pip install galaxy-workflow-executor

conda install -c bioconda galaxy-workflow-executor

www.github.com/ebi-gene-expression-group/galaxy-workflow-executor

6 of 25

Running Galaxy-workflow-executor

Galaxy-

workflow-

executor (CLI)

bioblend

7 of 25

Running Galaxy-workflow-executor

Galaxy-

workflow-

executor (CLI)

bioblend

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

8 of 25

Running Galaxy-workflow-executor

Galaxy-

workflow-

executor (CLI)

bioblend

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

  • It should be the JSON file
  • The workflow should be annotated with labels
    • ideally for all steps,
    • but at least for the steps where you want to be able to set parameters through the parameters dictionary.

9 of 25

Running Galaxy-workflow-executor

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

step_label_x:

param_name: "value"

....

nested_param_name:

n_param_name: "n_value"

....

x_param_name: "x_value"

step_label_x2:

....

....

other_galaxy_setup_params: { ... }

10 of 25

Running Galaxy-workflow-executor

generate_params_from_workflow.py -C galaxy_credentials.yaml -G test_instance -o test -W wf.json

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

step_label_x:

param_name: "value"

....

nested_param_name:

n_param_name: "n_value"

....

x_param_name: "x_value"

step_label_x2:

....

....

other_galaxy_setup_params: { ... }

11 of 25

Running Galaxy-workflow-executor

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

matrix:

path: /path/to/E-MTAB-4850.aggregated_filtered_counts.mtx

type: txt

barcodes:

path: /path/to/E-MTAB-4850.aggregated_filtered_counts.mtx_cols

type: tsv

gtf:

dataset_id: fe139k21xsak

genes:

library_id: asd24sdfasd5

12 of 25

Running Galaxy-workflow-executor

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

step_label_x:

- any

step_label_z:

- 1

- 43

13 of 25

Running Galaxy-workflow-executor

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

__default: test

test:

key: "<ADMIN_USER_API_KEY>"

url: "http://localhost:8080/"

14 of 25

Running Galaxy-workflow-executor

Galaxy-

workflow-

executor (CLI)

bioblend

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

15 of 25

Running Galaxy-workflow-executor

Galaxy-

workflow-

executor (CLI)

bioblend

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

run_galaxy_workflow.py -C test/creds.yaml -G test -o test_out/ -H 'test history' -W test/wf.json -i test/wf_inputs.yaml

-P test/wf_parameters.yaml --parameters-yaml

16 of 25

Running Galaxy-workflow-executor

      • Upload datasets
      • Upload workflow
      • Run & wait for result
      • Allow errors on specific steps
      • Retrieve results

Galaxy-

workflow-

executor (CLI)

bioblend

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

17 of 25

Running Galaxy-workflow-executor

      • Upload datasets
      • Upload workflow
      • Run & wait for result
      • Allow errors on specific steps
      • Retrieve results

Galaxy-

workflow-

executor (CLI)

bioblend

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

Results

18 of 25

Running Galaxy-workflow-executor

      • Upload datasets
      • Upload workflow
      • Run & wait for result
      • Allow errors on specific steps
      • Retrieve results

      • Results
        • Downloaded
        • Upload to data library
        • Persist on history

Galaxy-

workflow-

executor (CLI)

bioblend

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

Results

19 of 25

Running Galaxy-workflow-executor

Galaxy-

workflow-

executor (CLI)

bioblend

Credentials

  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -
  • - - - - -

inputs.yaml

Workflow

Parameters

allow_errors.yaml

Results

Dedicated instance on EBI LSF Cluster

Kubernetes on the cloud (AWS, GCP, OpenStack)

20 of 25

Use case at EMBL-EBI

Submit Control

Workflow

Run galaxy workflow

data

galaxy-

workflow-

executor

Input

Workflow file

data

data

Results

Results

Results

Robust

Tested with continuous workflow runs at EMBL-EBI totalling more than thousands of individual executions.

21 of 25

Use case at Persist-seq

AWS S3

Download

data

Create datalib

Run workflow

Upload results

mount

datalib

galaxy-

workflow-

executor

datalib

Results

Input

Workflow file

22 of 25

Persist-seq

      • Consortium
        • 14 European partners
      • Aim: Building a reproducible single-cell sequencing workflow to capture tumour persistence

23 of 25

Main Benefits

      • Automation
        • Helps running automated pipeline periodically

      • Enables collaboration
        • Workflow can be shared with collaborators and can be invoked on different Galaxy instances

24 of 25

Acknowledgements

EBI

Irene Papatheodorou

Pedro Madrigal

Iris Yu

Ex-EBI

Pablo Moreno

Jonathan Manning

Suhaib Mohammed

Andrey Solovyev

Nicola Soranzo (Earlham Institute, UK)

Galaxy community

25 of 25

Thank you

Mail: anilthanki@ebi.ac.uk

Twitter: @anilthanki

Matrix: @anilthanki