Dockstore Fundamentals:
Introduction to Docker and Descriptors for Reproducible Analysis
Louise Cabansay, Software Engineer, UC Santa Cruz Genomics Institute
Andrew Duncan, Software Engineer, Ontario Institute for Cancer Research
Denis Yuen, Senior Software Engineer, Ontario Institute for Cancer Research
1
Learning Objectives
2
Format and Setup
3
Format and Setup
4
Chrome Browser Required (currently a bug on Instruqt w/ Firefox)
Download : https://www.google.com/chrome/
What is Dockstore?
Dockstore is a free and open source platform for sharing scientific tools and workflows.
Portability
Interoperability
Reproducibility
5
“An app store for bioinformatics”
Portability:
6
Software is “packaged” using container technology and described using descriptor languages
+
Descriptor
Container
What’s on Dockstore? tools and workflows
7
Workflow
Tool
OR
Container + Descriptor
Workflow: Tools + Descriptor
A tool uses a single container and performs a single action or step that is outlined by a descriptor.
A workflow can use multiple containers and executes multiple actions or steps, still outlined by a descriptor
Example:
8
Variant Calling Pipeline
BWA-MEM
OR
A tool uses a single container and performs a single action or step that is outlined by a descriptor.
(ex: alignment)
Tool
Workflow
Debian OS
binaries
C
GNU Make
GCC
zlibs
processing
alignment
call variants
A workflow can use multiple containers and executes multiple actions or steps, still outlined by a descriptor
*note: you can register a single tool as a workflow on Dockstore, but a multi-step workflow cannot be registered as a tool.
ex: BWA-MEM can technically be registered as both
Interoperability:
9
Source Control
Analysis Environments
Integration with various sites allows Dockstore to function as centralized catalog of bioinformatics tools and workflows
By following GA4GH API standards, Dockstore enables users to run tools and workflows in their preferred compute and analysis environments
700+ tools and workflows published to Dockstore
Docker Registries
Reproducibility: Create, Share, Use
10
Docker Basics
*as used on Dockstore
11
What is a container? What is Docker?
12
Container:
A container encapsulates all the software dependencies associated with running a program.
Docker:
A particularly popular brand of container
What kinds of problems are solved by containers?
13
Docker Concepts: Container, Image, Registry
14
Image:
Packaged up code with its all dependencies at rest.
Registry:
Repositories where users can store images privately or publicly in the cloud.
Dockstore itself does not host images, but rather gets them from Image Registries:
Container:
A running image
**the terms container and image are often used interchangeably, but there is a slight distinction.
Docker Ecosystem
15
Docker Daemon
docker pull
docker run
docker build
Docker CLI
Host Machine
Images
Containers
Image Registry
Docker Hub
Quay.io
GCR
Docker Ecosystem
16
docker pull
docker pull
docker run
docker build
Docker Hub
Quay.io
GCR
Docker Daemon
Docker CLI
Host Machine
Images
Containers
Image Registry
Docker Ecosystem
17
docker pull
docker run
docker build
Docker Hub
Quay.io
GCR
docker run
Note: the docker run command will also pull from a registry if the image doesn’t already exist on machine
Docker Daemon
Docker CLI
Host Machine
Images
Containers
Image Registry
Start Up Instruqt
18
Wait for start to show up
Docker Client (CLI)
A command-line utility for:
19
docker [sub-command] [-flag options] [arguments]
Basic Docker Sub Commands:
20
docker info [OPTIONS]
Display system-wide information about your installation of docker:
docker image [COMMAND]
Managing docker images:
docker container [COMMAND]
Managing docker containers:
docker run [-flags] [registry name]/[path to image repository]:[tag] [arguments]
Running docker containers:
Docker has a whole library of commands, here are some basic examples:
How are containers commonly used?
Run and done
21
How are containers commonly used?
Run and done
Run continuously
22
Your main method of containers!!
How do I ‘run’ a container?
23
docker run [-flag options] [registry name]/[path to image repository]:[tag] [arguments]
base
run command
dockerhub*
quay.io
gcr.io
generally, the arguments are what gets passed into the container
ex: the command you want to run
Both official containers and user containers are available
Only specify the registry if its not dockerhub
The ‘version’ of the image you want to run
24
Exercise #1a: Running containers
25
docker run docker/whalesay cowsay "fill me in"
Exercises:
Use the whalesay container from Docker Hub to print a welcome message
docker run [-flag options] [registry name]/[path to image repository]:[tag] [arguments]
Exploring containers interactively
26
docker run -it quay.io/ldcabansay/samtools:latest
Example:
Enter the samtools container and confirm that samtools is installed
docker run [-flag options] [registry name]/[path to image repository]:[tag] [arguments]
-i -t
Sharing data between host and container
Bind mounts (-v) (aka two-way data binding)
27
docker run -v /usr/data:/tmp/data ...
Output stored in the container directory /tmp/data will also be available on the host at /usr/data
docker run -v [ path where data is ]:[ path to put data ] ...
HOST
CONTAINER
Note: using absolute paths is highly recommended
Exercise #1b: exploring containers
28
docker run [-flag options] [registry name]/[path to image repository]:[tag] [arguments]
-i -t -v
docker run -it -v /root/bcc2020-training/data:/data quay.io/ldcabansay/samtools:latest
Exercise(s):
Enter the samtools container, but this time bring in some data!
docker run -v [ path where data is ]:[ path to put data ] ...
HOST
CONTAINER
docker run -v /root/bcc2020-training/data:/data quay.io/ldcabansay/samtools:latest samtools view -S -b /data/mini.sam -o /data/mini.bam
Convert a sam file to a bam file using the samtools container.
Data binding: In-depth (Extra Reading)
29
Dockerfiles: Custom Images
30
Primer: How is software installed and used?
31
Author of dockerfile programmatically details the software installation and any other steps for environment setup
Image built from the dockerfile can then be used for ‘off-the-shelf’ software usage by others.
Package managers
Executable Files or Binaries �(ex: *.jar, *.c, grep, tar, diff, md5sum)
Building or running from source files
Dockerfiles Overview:
A simple text file with instructions to build an image:
32
Dockerfile
x
FROM
MAINTAINER
RUN
ENV
CMD
base image (start)
commands to:
- install software
- install dependencies
- run scripts
- misc. environment setup
environment variables
command to execute when container starts (optional)
author metadata
Dockerfiles - local
33
docker pull
docker run
docker build
Docker Daemon
Docker CLI
Host Machine
Images
Containers
Dockerfile
Configuration to set up a docker image
Dockerfiles - local
34
docker pull
Docker Daemon
Docker CLI
Host Machine
Images
Containers
Dockerfile
docker run
docker build
Configuration to set up a docker image
Example: BWA (via package manager)
35
Dockerfile
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Start with a base image
FROM ubuntu:18.04
# Add file author/maintainer and contact info (optional)
MAINTAINER Louise Cabansay <lcabansa@ucsc.edu>
#set user you want to install packages as
USER root
#update package manager & install dependencies (if any)
RUN apt update
# install analysis software from package manager
RUN apt install -y bwa
#######################################################
# Dockerfile to build a sample container for bwa
#######################################################
Basic Commands: docker build
36
docker build -t bwa:v1.0 .
( -t ) : Builds and creates a tag v1.0 for a bwa image (if in dockerfile directory)
docker build -t bwa:v1.0 -f dockerfiles/bwa/Dockerfile .
( -f ) : Build a specific Dockerfile by providing path to file (relative to build context)
docker image ls
View built Docker images
docker build [-flag options] [build context]
-t -f
37
Exercise #2a: Writing your first Dockerfile: tabix
Dockerfiles describe the packaged up environment:
38
Dockerfile
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Start with a base image
FROM { base image name }
# Add file author/maintainer and contact info (optional)
MAINTAINER {your name} <youremail@research.edu>
# set user you want to install packages as
USER root
# update package manager & install dependencies (if any)
RUN apt update
# install analysis software from package manager
RUN apt install -y { software package name }
docker image build -t { name } -f { path to dockerfile } .
Build an image from Dockerfile:
Exercise #2b: Try out your new container!
39
docker image ls
docker run [image id] tabix
Exercises:
1. Verify that your image was built (get the image ID to use in part 2)
2. Use your local image to view the tabix command help
docker container run [-flag options] [registry name]/[path to image repository]:[tag] [args]
Ex: bamstats (executable)
40
Dockerfile
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Start with a base image
FROM ubuntu:14.04
# Add file author/maintainer and contact info (optional)
MAINTAINER Brian OConnor <briandoconnor@gmail.com>
# install software dependencies
USER root
RUN apt-get -m update && apt-get install -y wget unzip \
openjdk-7-jre zip
# manual software installation from source
# get the tool and install it in /usr/local/bin
RUN wget -q http://downloads.sourceforge.net/project
/bamstats/BAMStats-1.25.zip
# commands/scripts to finish software setup
RUN unzip BAMStats-1.25.zip && \
rm BAMStats-1.25.zip && \
mv BAMStats-1.25 /opt/
COPY bin/bamstats /usr/local/bin/
RUN chmod a+x /usr/local/bin/bamstats
# switch back to the ubuntu user so this tool (and the files written) are not owned by root
RUN groupadd -r -g 1000 ubuntu && useradd -r -g ubuntu -u 1000 -m ubuntu
USER ubuntu
# command /bin/bash is executed when container starts
CMD ["/bin/bash"]
Example: samtools (compile from source files)
41
Dockerfile
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Start with a base image
FROM ubuntu:18.04
# Add file author/maintainer and contact info (optional)
MAINTAINER Louise Cabansay <lcabansa@ucsc.edu>
# install software dependencies
RUN apt update && apt -y upgrade && apt install -y \
wget build-essential libncurses5-dev zlib1g-dev \
libbz2-dev liblzma-dev libcurl3-dev \
WORKDIR /usr/src
# get the software source files
RUN wget https://github.com/samtools/samtools/releases/
download/1.10/samtools-1.10.tar.bz2
# installation commands to compile source files
tar xjf samtools-1.10.tar.bz2 && \
rm samtools-1.10.tar.bz2 && \
cd samtools-1.10 && \
./configure --prefix $(pwd) && \
make
# add newly built executables to path
ENV PATH="/usr/src/samtools-1.10:${PATH}"
Sharing your Dockerfiles and Images
42
A Dockerfile contains the configuration to package up your software into an image
Dockerfile
Source Control
Image Registry
Dockstore recommends storing your Dockerfile in an external repository (Bitbucket, GitHub, GitLab) and then registering your source controlled Dockerfile to an image registry (Docker Hub, Quay.io, Google Container Registry, etc)
Best Practices (Take home reading)
43
What Next?
Docker is great, it tells us how to install software.
However, it doesn’t tell us how to use software.
Descriptor languages are the solution!
44
Break
45
What Next?
Docker is great, it tells us how to install software.
However, it doesn’t tell us how to use software.
Descriptor languages are the solution!
46
Intro to Descriptors (WDL)
47
Components and Concepts shared by Descriptors
48
Descriptor:
A workflow language used to describe how to run your pipeline.
Parameter File (wdl, cwl):
Container:
Packaged up code with all of its dependencies. This allows for portable software that runs quickly and reliably from one computing environment to another.
CWL: Common Workflow Language
49
CWL: Common Workflow Language�
Implementations/Engines:�
Analysis Platforms (Launch-with)
Nextflow
50
Nextflow
Running nextflow workflows
WDL: Workflow Description Language
51
WDL: Workflow Description Language�
Engines:
Analysis Platforms (Launch-with)
What’s in a WDL? Top-level Components
3 top-level components that are part of the core structure of a WDL script
52
workflow.wdl
x
Top-level Components - Workflow
Workflow: Code block that defines the overall workflow. You can think of it as an outline.
53
workflow myWorkflowName {
}
workflow.wdl
x
Top-level Components - Workflow
Workflow: Code block that defines the overall workflow. You can think of it as an outline.
54
workflow myWorkflowName {
}
input {
...
}
workflow.wdl
x
output {...}
Top-level Components - Call
Call: Component that defines which tasks the workflow will run
55
workflow myWorkflowName {
}
call task_B {
input: ...
}
call task_A
input {
...
}
workflow.wdl
x
output {...}
Top-level Components - Task
Task: Defines all the information necessary to perform an action.
56
workflow myWorkflowName {
}
task task_A { … }
task task_B { … }
input {
...
}
workflow.wdl
x
call task_B {
input: ...
}
call task_A
output {...}
What’s in a Task?
Task: Defines all the information necessary to perform an action
57
task doSomething {
}
task.wdl
x
What’s in a Task? Command
Task: Defines all the information necessary to perform an action
58
task doSomething {
}
task.wdl
x
command {
echo Hello World!
cat ${myName}
}
What’s in a Task? Inputs
Task: Defines all the information necessary to perform an action
�
59
task doSomething {
}
command {
echo Hello World!
cat ${myName}
}
input { File myName }
task.wdl
x
input { File? myName }
input { String? myName=“Foobar” }
What’s in a Task? Outputs
Task: Defines all the information necessary to perform an action
�
60
task doSomething {
}
output {
File outFile = “Hello.txt”
}
command {
echo Hello World! > Hello.txt
cat ${myName} >> Hello.txt
}
input { File myName }
task.wdl
x
Simple Example: HelloWorld.wdl
61
HelloWorld.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
version 1.0
# add and name a workflow block
workflow hello_world {
}
# define the ‘hello’ task
task hello {
input { File myName }
command {
echo Hello World! > Hello.txt
cat ${myName} >> Hello.txt
}
output { File outFile = “Hello.txt” }
}
call hello
# important: add output for whole workflow
output {
File helloFile = hello.outFile
}
Parameter JSON (simple):
62
value can be path, string, int, array, etc
"hello_world.hello.myName": "/<usr>/bcc2020/wdl-training/exercise1/name.txt"
workflow
name
task
name
parameter
name
KEY
VALUE
hello.json
x
1
2
3
{
"hello_world.hello.myName": "/<usr>/bcc2020/wdl-training/exercise1/name.txt"
}
Note: using absolute paths is highly recommended
63
Exercise #1: Run your first wdl
64
HelloWorld.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
version 1.0
workflow hello_world {
call hello
output { File helloFile = hello.outFile }
}
task hello {
input { File myName }
command {
echo Hello World! > Hello.txt
cat ${myName} >> Hello.txt
}
output { File outFile = “Hello.txt” }
}
dockstore workflow launch --local-entry HelloWorld.wdl --json hello.json
Run with DockstoreCLI
Overview: Dockstore CLI
A handy command line resource to help users develop content locally.
65
Example execution with the Dockstore Command Line Interface (CLI):
dockstore workflow launch --local-entry HelloWorld.wdl --json hello.json
What’s in a Task? Runtime
Task: Defines all the information necessary to perform an action
66
task doSomething {
}
output {
File outFile = “Hello.txt”
}
command {
echo Hello World! > Hello.txt
cat ${myName} >> Hello.txt
}
input { File myName }
task.wdl
x
runtime {
docker: “ubuntu:latest”
memory: “1GB”
}
What’s in a Task? Parameterization
Task: Defines all the information necessary to perform an action
67
task doSomething {
}
output {
File outFile = “${outFile}”
}
command {
echo Hello World! > ${outFile}
cat ${myName} >> ${outFile}
}
input {
File myName
String outFile
}
task.wdl
x
runtime {
docker: docker_image
memory: “${memory_gb}”
}
But is this always a best practice?
A task can have declarations which are intermediate values rather than inputs.
68
task doSomething {
}
#creating non-input declaration
String myString = “hi ” + ${myName}
String outFile = ${myName} + “.out”
task.wdl
x
# example usage in command
command {
echo ${myString} > ${outFile}
}
input { String myName }
# example usage in output
output {
File outFile = “${outFile}”
}
WDL Standard Library (Take Home)
Built-in functions or methods provided by the core WDL language
69
WDL Standard Library (Simple)
70
task hello {
}
output {
File outFile = “Hello.txt”
}
command {
echo Hello World! > Hello.txt
cat ${myName} >> Hello.txt
}
input { File myName }
task.wdl
x
task hello {
}
output {
File outFile = stdout()
}
command {
echo Hello World!
cat ${myName}
}
input { File myName }
task.wdl
x
Output:
“hello.outFile” : “.../stdout”
Output:
“hello.outFile” : “.../Hello.txt”
Primer Exercise #2
For our second exercise we’re going to parameterize a simple workflow.
Goal: generate statistics about an alignment file
There will be multiple ways to solve this assignment. This is a chance for you to apply the things we’ve learned to a real bioinformatics workflow.
71
Exercise #2: Complete metrics.wdl
quay.io/ldcabansay/samtools:latest
72
runtime {
docker: “ubuntu:latest”
memory: “1GB”
}
metrics.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow metrics {
call flagstat
output { File align_metrics = flagstat.metrics }
}
task flagstat {
input { File input_sam }
# non-parameterized flagstat command
command {
samtools flagstat mini.sam > mini.sam.metrics
}
output {
File metrics = “mini.sam.metrics”
}
# set some parameterized runtime parameters
runtime {
docker: # set
}
}
Exercise #2: Solution* - Descriptors
73
metrics.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow metrics {
call Flagstat
output { File align_metrics = flagstat.metrics }
}
task flagstat {
input { File input_sam }
# non-parameterized flagstat command
command {
samtools flagstat mini.sam > mini.sam.metrics
}
output {
File metrics = “mini.sam.metrics”
}
# set some parameterized runtime parameters
runtime {
docker: # set
}
}
metrics.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow metrics {
call Flagstat
output { File align_metrics = flagstat.metrics }
}
task flagstat {
input { File input_sam }
# slightly parameterized flagstat command
command {
samtools flagstat ${input_sam} > mini.sam.metrics
}
output { File metrics = “mini.sam.metrics” }
# set docker runtime
runtime {
docker: “quay.io/ldcabansay/samtools:latest”
}
}
*note: this is one example solution, multiple are possible
Exercise #2: Solution vs Solution2
74
metrics.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow metrics {
call Flagstat
output { File align_metrics = flagstat.metrics }
}
task flagstat {
input { File input_sam }
# slightly parameterized flagstat command
command {
samtools flagstat ${input_sam} > mini.sam.metrics
}
output { File metrics = “mini.sam.metrics” }
# set docker runtime
runtime {
“quay.io/ldcabansay/samtools:latest”
}
}
metrics.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow metrics {
call Flagstat
output { File align_metrics = flagstat.metrics }
}
task flagstat {
input {
File input_sam
String docker_image
}
# create a string to help parameterize command
String stats = basename(input_sam) + “.metrics”
command {
samtools flagstat ${input_sam} > ${stats}
}
output { File metrics = “${stats}” }
# set a parameterized docker runtime
runtime {
docker: docker_image
}
}
*note: this is one example solution, multiple are possible
Break
75
Multi-task workflows:
76
Example: Multi-task workflow
77
HelloWorld.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow hello_world {
call hello
output { File helloFile = hello.outFile }
}
task hello {
input { File myName }
command {
echo Hello World!
cat ${myName}
}
output { File outFile = stdout() }
}
GoodbyeWorld.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow goodbye_world {
call goodbye
output { File byeFile = goodbye.outFile }
}
task goodbye {
input { File greeting }
command {
cat ${greeting}
echo See you later!
}
output { File outFile = stdout() }
}
Example: Multi-task workflow - HelloGoodbye.wdl
78
HelloGoodbye.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow HelloGoodbye {
call hello
call goodbye {
input: greeting = hello.outFile
}
output { File hello_goodbye = goodbye.outFile }
}
task hello {
input { File myName }
command {
echo Hello World!
cat ${myName}
}
output { File outFile = stdout() }
}
task goodbye {
input { File greeting }
command {
cat ${greeting}
echo See you later!
}
output { File outFile = stdout() }
}
HelloGoodbye.json
x
1
2
3
4
5
{
"HelloGoodbye.hello.myName": "/root/bcc2020-training/wdl-training/exercise3/
hello_examples/name.txt"
}
WDL Imports
A WDL file may contain import statements to include WDL code from other sources.
79
Imports: Concepts
80
workflow
name
task
name
parameter
name
workflow primary {
...
...
}
primary-descriptor.wdl
x
import "<resource>" as <alias>
task task_A { ... }
call <alias>.taskOne {
input: ...
}
call task_A
"primary.taskOne.param_name": "<value of param or path if file>"
JSON mapping:
Example: No Imports vs Imports
81
HelloGoodbye.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow HelloGoodbye {
call hello
call goodbye {
input: greeting = hello.outFile
}
output { File hello_goodbye = goodbye.outFile }
}
task hello {
input { File myName }
command {
echo Hello World!
cat ${myName}
}
output { File outFile = stdout() }
}
task goodbye {
input { File greeting }
command {
cat ${greeting}
echo See you later!
}
output { File outFile = stdout() }
}
HelloGoodbye_imports.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
# add import statements to bring in sub-workflows
# if not given, namespace/alias = file minus ‘.wdl’
import “HelloWorld.wdl”
# otherwise, namespace = <alias>
import “GoodbyeWorld.wdl” as bye
workflow HelloGoodbye {
#call the hello task, syntax: <alias>.taskname
call HelloWorld.hello
#call the goodbye task, syntax: <alias>.taskname
call bye.goodbye {
input: greeting = hello.outFile
}
#same as before, define workflow outputs
output { File hello_goodbye = goodbye.outFile }
}
Example: No Imports vs Imports
82
HelloGoodbye.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow HelloGoodbye {
call hello
call goodbye {
input: greeting = hello.outFile
}
output { File hello_goodbye = goodbye.outFile }
}
task hello {
input { File myName }
command {
echo Hello World!
cat ${myName}
}
output { File outFile = stdout() }
}
task goodbye {
input { File greeting }
command {
cat ${greeting}
echo See you later!
}
output { File outFile = stdout() }
}
HelloGoodbye_imports.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
# add import statements to bring in sub-workflows
# if not given, namespace/alias = file minus ‘.wdl’
import “HelloWorld.wdl”
# otherwise, namespace = <alias>
import “GoodbyeWorld.wdl” as bye
workflow HelloGoodbye {
#call the hello task, syntax: <alias>.taskname
call HelloWorld.hello
#call the goodbye task, syntax: <alias>.taskname
call bye.goodbye {
input: greeting = hello.outFile
}
#same as before, define workflow outputs
output { File hello_goodbye = goodbye.outFile }
}
Do we have to change the JSON when running HelloGoodbye using imports?
Example: No Imports vs Imports (no comments)
83
HelloGoodbye.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow HelloGoodbye {
call hello
call goodbye {
input: greeting = hello.outFile
}
output { File hello_goodbye = goodbye.outFile }
}
task hello {
input { File myName }
command {
echo Hello World!
cat ${myName}
}
output { File outFile = stdout() }
}
task goodbye {
input { File greeting }
command {
cat ${greeting}
echo See you later!
}
output { File outFile = stdout() }
}
HelloGoodbye_imports.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
import “HelloWorld.wdl”
import “GoodbyeWorld.wdl” as bye
workflow HelloGoodbye {
call HelloWorld.hello
call bye.goodbye {
input: greeting = hello.outFile
}
output { File hello_goodbye = goodbye.outFile }
}
Primer for Exercise #3: (if time permits)
84
Ex: BWA Aligner
85
aligner.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
version 1.0
workflow alignReads {
call bwa_align
output { File output_sam = bwa_align.output_sam }
}
task bwa_align {
input {
String sample_name
String docker_image
String? bwa_options
File read1_fastq
File read2_fastq
File ref_fasta
File ref_fasta_fai
File ref_fasta_amb
File ref_fasta_ann
File ref_fasta_bwt
File ref_fasta_pac
File ref_fasta_sa
}
String output_sam = “${sample_name}” + .“sam”
command {
bwa mem ${bwa_options} ${ref_fasta} \
${read1_fastq} ${read2+fastq} > ${output_sam}
}
output { File output_sam = “${output_sam}” }
runtime {
docker: docker_image
memory: “${memory_gb}” + “GB”
}
meta {
author: "Foo Bar"
email: "foobar@university.edu"
}
}
Example: Metrics.wdl (samtools flagstat)
Same as the solution to exercise #2
86
metrics.wdl
x
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
version 1.0
workflow metrics {
call Flagstat
output { File align_metrics = Flagstat.metrics }
}
task Flagstat {
input {
File input_sam
String docker_image
}
# create a string to help parameterize command
String stats = basename(input_sam) + “.metrics”
command {
samtools flagstat ${input_sam} > ${stats}
}
output { File metrics = “${stats}” }
# set some parameterized runtime parameters
runtime {
docker: docker_image
}
}
Exercise#3: Writing a multi-task workflow (if time permits)
Create a multi-task workflow: align_and_metrics.wdl
87
Importing workflows: Best Practices & Tips (Take home)
88
Importing workflows: Caveats (Take home)
https://docs.dockstore.org/en/develop/end-user-topics/language-support.html
89
Metadata and Parameter Metadata (Take Home)
Metadata
Parameter Metadata
90
task doSomething {
}
output {
File outFile = “Hello.txt”
}
command {
echo Hello World! > Hello.txt
cat ${myName} >> Hello.txt
}
input { File myName }
task.wdl
x
More trainings and tutorial content:
91
Summary, Dockstore, and next steps!
92
What is Dockstore?
Dockstore is a free and open source platform for sharing scientific tools and workflows.
Portability
Interoperability
Reproducibility
93
“An app store for bioinformatics”
Dockstore Ecosystem
94
Source Control
Analysis Environments
Store your descriptors and containers and descriptors on your preferred sites
Dockstore’s launch-with feature enables users to export tools and workflows to a variety of cloud compute platforms
Register these as tools and workflows on Dockstore, allowing for a centralized bioinformatics catalog of resources
Docker Registries
Partner Platforms
Launching Analysis
Structural Variant Calling using Graph Genomes
Contributed by: Jean Monlong and Charles Markello (VG Team, UC Santa Cruz, Genomics Institute)
Launching Analysis - Example
96
Cumulus Workflow: https://dockstore.org/workflows/github.com/klarman-cell-observatory/cumulus/Cumulus:0.15.0?tab=info
AnVIL Organization, Cumulus Collection: https://dockstore.org/organizations/anvil/collections/Cumulus
Contributed by: Bo Li & Yiming Yang (Cumulus Team, Broad Institute)
General Best Practices
97
DOIs
Create snapshots and digital object identifiers for your workflows to permanently capture the state of a workflow for publication
Creating Snapshots and Requesting DOIs — Dockstore documentation
Examples: �Forward� https://doi.org/10.5281/zenodo.3889018�Backward
https://dockstore.org/workflows/github.com/dockstore/hello_world:master?tab=versions
98
Organizations
99
Landing page to showcase tools and workflows
Example: a COVID-19 collection that submits to Nextstrain https://dockstore.org/organizations/BroadInstitute/collections/pgs
Getting Help on Dockstore
User forum at https://discuss.dockstore.org/
100
Documentation and Tutorials
101
Dockstore Ecosystem
102
Dockstore is thankful to its many contributors, users, and partners. This community has pulled together a library of over 700 tools and workflows. In the diagram to the right we’ve highlighted a few select contributors to give a sense of what has been occuring in this space.
The Dockstore Team
103
Louise Cabansay
Natalie Perez
Melaina Legaspi
Charles Reid
Emily Soth
Andy Chen
Benedict Paten
Elnaz Sarbar
Charles Overbeck
Walt Shands
David Steinberg
Nneka Olunwa
Lincoln Stein
Denis Yuen
Andrew Duncan
Gary Luu
Gregory Hogue
Acknowledgements
104
This work was funded by the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-168).
Funded by:
Extra Slides for Q&A
105
Additional Readings
Note: -v has historically been how volumes are mounted, however --mount is an equivalent option with a different syntax
106
Exercise #1a: Using Docker
107
docker info
Display system-wide information about your installation of docker:
docker image help
Managing docker images:
docker container help
Managing docker containers:
Docker has a whole library of commands, here are some basic examples:
docker container run hello-world
Run the official hello-world docker container from dockerhub:
Exercise #1b: Explore the Dockstore CLI (Take Home)
108
dockstore workflow convert entry2json --entry [ dockstore identifier ] > [ parameter.json ]
Make a JSON template based off descriptor located remotely on dockstore:
dockstore workflow launch --entry [ dockstore identifier ] --json [ parameter.json ]
Run a descriptor located remotely on dockstore:
dockstore workflow convert wdl2json --wdl hello-task.wdl > convert.json
Make a JSON template based off a local WDL:
Scatter Gather ( take home reading )
Scatter
Gather
Beginner Example - Scatter Gather Pipeline
Advanced Example - Use scatter-gather to joint call genotypes
109
What’s in a WDL? Top-level Summary
Workflow: Code block that defines the overall workflow.
Call: Defines which tasks to run
Task: Defines all the information necessary to perform an action.
110
3 top-level components that are part of the core structure of a WDL script
workflow myWorkflowName {
}
task task_A { … }
task task_B { … }
input {
...
}
workflow.wdl
x
call task_B {
input: ...
}
call task_A
output {...}
What’s in a Task? Summary
Task: Defines all the information necessary to perform an action in a parameterized way.
111
task doSomething {
}
output {
File outFile = “{outFile}”
}
command {
echo Hello World! > {outFile}
cat ${myName} >> {outFile}
}
input {
File myName
String outFile
}
task.wdl
x
runtime {
docker: docker_image
memory: “${memory_gb}”
}
Summary
A workflow:
A task:
112
Ways to Register to Dockstore
113
Containerized Tool
Workflow:
Tools + Descriptor
Dockstore Registration
External Hosting
Docker image
Build
System
Dockerfile
Descriptor
Register workflow and tool descriptors from external source control
+
Point to docker image(s) on quay or dockerhub
1.9.0 install the Dockstore GitHub app to automatically update Dockstore when workflow is updated on GitHub
Dockstore Ecosystem
114
Source Control
Analysis Environments
Store your descriptors and containers and descriptors on your preferred sites
Dockstore’s launch-with feature enables users to export tools and workflows to a variety of cloud compute platforms
Register these as tools and workflows on Dockstore, allowing for a centralized bioinformatics catalog of resources
Docker Registries
Language Support
115