Bioinformatics Workflows and Package management
Caleb Kibet
Package Management
The alternatives can be categorized into system-wide (Debian-Med, Genotoo Science, BioLinux, and Homebrew) and per-user (EasyBuild, GNU Guix, and BioBuilds) installation mechanisms.
There are alternative slides on Package a management using Conda.
What is a Bioinformatics Pipeline
Number of steps to analyse data
Can be simple or very Complex
Example
Bioinformatics pipelines: what’s the problem?
Example:
user@machine:~> bwa mem -t $NSLOTS -M $BWA_INDEX_REF -R "@RG\tID:$PU\tPL:illumina\tPU:$PU\tSM:$SAMPLE" $READS1 $READS2 | samblaster --splitterFile >(samtools view -hSu /dev/stdin | samtools sort -@ $NSLOTS /dev/stdin > $SAMPLE.sr.bam) --discordantFile >(samtools view -hSu /dev/stdin | samtools sort -@ $NSLOTS /dev/stdin > $SAMPLE.disc.bam) | samtools view -hSu /dev/stdin | samtools sort -@ $NSLOTS /dev/stdin > $SAMPLE.raw.bam
What is the problem
Workflows
Tools for automating bIoinformatics Analyses
How?
Scalable pipeline components. A pipeline consists of third-party tools, data parsers, and data transformations. (Fjukstad and Bongo, 2017)
Advantages of workflows
Workflow languages use the concept of analysis preservation, which offers several advantages:
Common Workflows and Containers
Singularity
Snakemake
Which framework do you choose?
Hands-on with Snakemake