1 of 43

Travelling with warp-speed

Marius van den Beek, John Chilton, Nicola Soranzo and the Galaxy Team

Slides @ bit.ly/gxworkflows2018

-- an update on Galaxy Workflows

2 of 43

Workflows

Linear progression of analysis steps

3 of 43

Workflows

4 of 43

Workflows

Linear progression of analysis steps

Store Tool Parameters

5 of 43

Workflows

6 of 43

Workflows

Communicate intent

7 of 43

Workflows

8 of 43

Workflows

Communicate intent

Enable cooperation

9 of 43

Workflows

10 of 43

Workflows

11 of 43

Workflows

Create from history

Create from Scratch in Workflow Editor

Input + Parameters + Workflow = Output

12 of 43

Workflows

13 of 43

Workflows

Create from history

Create from Scratch in Workflow Editor

Input + Parameters + Workflow = Output

14 of 43

A history of Workflows in Galaxy

Workflow Editor ~ 11 yo

Extract from history ~ 11 yo

Workflow in Tool panel ~ 11 yo

Input modules ~ 11 yo

Post Job Actions ~ 9 yo

Collections in Workflows ~ 5 yo

15 of 43

Modularize workflows w/ subworkflows

https://usegalaxy.org/u/marius/w/parent-workflow-chipseq

16 of 43

Modularize workflows w/ subworkflows

17 of 43

Subworkflows vs workflow spaghetti

18 of 43

17.09 - Re-run and Replace

19 of 43

18.01 - Workflow Post Job Actions

The standard set of dataset post job actions (hide, tag, delete, rename) all work very intuitively with collections... finally.

20 of 43

18.01 - Workflow Post Job Actions

The standard set of dataset post job actions (hide, tag, delete, rename) all work very intuitively with collections... finally.

21 of 43

18.01 - Scaling Job Cache

Find and re-use the output of jobs with identical combinations of input files and parameters, which, assuming deterministic output, should produce the same result.

22 of 43

18.01 - Scaling Job Cache

Enables running overlapping workflows without overhead

Quality Control Workflow

1

2

3

23 of 43

18.01 - Scaling Job Cache

Enables running overlapping workflows without overhead

Analysis Workflow

1

2

3

24 of 43

18.01/18.05 - Clone tools with settings

25 of 43

18.09 - Switch tool versions in editor

26 of 43

18.09 - Zoom in workflow editor

27 of 43

18.09 - Runtime parameters for Subworkflows

28 of 43

Future plans

Enable more complex analyses, such as branch points, computed parameters

29 of 43

Future plans

Future plans

Visual feedback about Workflow Progress

30 of 43

Future plans

Export Workflows, Parameters, Inputs, Runtime details, etc. into Research Objects (RO)

31 of 43

Workflow Tooling - CWL

Most (79 / 133) tests

pass in a fork.

Many workflow and collections

enhancements came from that branch, many more to come.

32 of 43

Workflow Tooling - Format 2

33 of 43

Workflow Tooling - Testing

http://bit.ly/gxwftests

34 of 43

Workflow Tooling - Testing

35 of 43

Planemo For Workflows

$ planemo serve <workflow.(ga|yml)>

pre-18.01�$ planemo test <workflow.(ga|yml)>

18.05+�$ planemo convert <workflow.(ga|yml)>

19.01�$ planemo workflow_edit <workflow.(ga|yml)>

19.01

36 of 43

Rules for Building!

Created arbitrarily nested collections in the API.

Great for organizing data from instrument sample sheets, spreadsheets with data source links, or FTP directories into collections.

37 of 43

Rules for Manipulating!

Embed rule editor right into the tool form.

Load collection elements as rows, metadata as columns.

Filter, re-group, sort existing collections interactively or as part of workflows.

Structure and restructure data as needed for different parts of an analysis.

38 of 43

Rules - Accessible from the Start

Huge thanks to Helena Rasche�for the detailed PR review and edits!

39 of 43

Rules for Workflows - The Problem

Saskia Hiltemann et. al. at Erasmus MC are working with an outside diagnostics lab (Streeklab Haarlem), pipelines must be easy and foolproof.

�Latest project - paired-end sequencing on two loci (that each encode for one part of HLA-DQ protein complex, HLADQ-A & HLADQ-B) for each patient.

Originally made a workflow with 4 separate inputs (gene A forward and reverse, gene B forward and reverse). Suboptimal because:

  • Big batches of patients, running a workflow per patient is cumbersome.
  • Four inputs was a bit error prone as it was easy to mix up forward/reverse files or geneA/B accidentally in the workflow run menu.

40 of 43

Rules for Workflows - The Solution

  • Set instruments to produce structured names HLADQ[A|B]-<patient id>_R[1|2].fastq
  • Users just upload flat collections (simple)
  • Use rules inside workflow to structure collections (once for gene, once for strand), summarize final results with iReport
  • Allow batching processing and start-to-finish preservation & utilization of metadata - eliminates errors and clicks

41 of 43

Group tags for complex analysis

Select datasets from nested collection allows multi-factor:

~ batch + condition

https://github.com/galaxyproject/tools-iuc/pull/2167

https://github.com/galaxyproject/galaxy/pull/5457

42 of 43

Re-usable workflow parameters

43 of 43

Thanks!

Thanks to the whole Galaxy community for building awesome stuff with workflows.