1 of 68

Bioinformatics Analyses on the

OSPool: A BWA Example

OSG Research Facilitation Team

1

2 of 68

Before We Start

We welcome questions! To ask questions, please raise your hand.

Part of this workshop is hands-on! You are welcome to follow along (we will walk through the steps together) or to simply watch.

2

3 of 68

Introductions

3

4 of 68

Research Computing Facilitators are Here to Help!

Showmic Islam

Rachel Lombardi

Mats

Rynge

  • Email: support@osg-htc.org
  • Zoom Office Hours: Tuesday 3-4:30pm CT & Thursday 10:30am-12pm CT

Christina Koch

4

5 of 68

Agenda

  • Introduction to OSG/Open Science Pool (OSPool) and High Throughput Computing
  • Using the OSPool
  • Installing Software
  • Burrows-Wheeler Aligner (BWA) Bioinformatics Analysis with Sequencing Samples
  • Exercise: High Throughput BWA Read Mapping
  • Getting Started with your Research Analysis

5

6 of 68

Primary Learning Objective

To understand the principles of running a bioinformatics workflow on the OSPool.

Learning Outcomes:

  • Using software on the OSPool
  • How to keep an organized work environment
  • Other HTCondor submit file options
  • Learn how to convert an existing bioinformatics workflow to run on the OSPool

6

7 of 68

Introduction to the OSG and Open Science Pool

7

8 of 68

What is the OSG?

The OSG Consortium builds and operates a set of pools of shared computing and data capacity for distributed high-throughput computing (HTC).

https://display.opensciencegrid.org

8

9 of 68

Open Science Pool

One of these pools, the Open Science Pool (OSPool) is operated for all US-associated open science.

OSPool

Computer by miracle from NounProject.com

9

10 of 68

Open Science Pool

One of these pools, the Open Science Pool (OSPool) is operated for all US-associated open science.

OSPool

Computer by miracle from NounProject.com

OSPool Access Point

10

11 of 68

Using the OSPool

11

12 of 68

HTCondor Job Flow

OSPool Access Point

Job Components

  • Software
  • Scripts
  • Input Data

HTCondor Submit File

/home/user

12

13 of 68

HTCondor Job Flow

OSPool Access Point

Job Components

  • Software
  • Scripts
  • Input Data

HTCondor Submit File

/home/user

$ condor_submit SubmitFile.submit

13

14 of 68

HTCondor Job Flow

OSPool Access Point

Job Components

  • Software
  • Scripts
  • Input Data

HTCondor Submit File

OSPool Execute Point

/home/user

Job specifications

HTCondor

14

15 of 68

HTCondor Job Flow

OSPool Access Point

Job Components

  • Software
  • Scripts
  • Input Data

HTCondor Submit File

Software

Scripts

Input Data

OSPool Execute Point

/home/user

/condor/scratch

Job specifications

HTCondor

15

16 of 68

HTCondor Job Flow

OSPool Access Point

Job Components

  • Software/Code
  • Scripts
  • Input Data

HTCondor Submit File

Software

Scripts

Input Data

Output Data

Log/Error/Out

OSPool Execute Point

/home/user

/condor/scratch

Output transferred

back

HTCondor

16

17 of 68

HTC on the Open Science Pool

The OSPool is a good fit for HTC workloads that can be distributed and open:

  • Jobs are short/resumable
    • Because the OSPool backfills other resources, interruptions are possible.
  • Jobs have a laptop-sized resource profile
    • Best for numerous jobs of one (or few) CPUs and <16GB memory, each.
  • Individual jobs need/produce less than 20GB of data
    • Because OSPool resources are distributed across the US (and world) it can be prohibitive for individual jobs that need more than ~20GB of data.
  • Jobs software and data can be ‘open’
    • Open-source software (no restrictive licensing), unrestricted data (no HIPAA, etc.)

17

18 of 68

What workloads are good for the OSPool*?

* the “less-so, but maybe” column could still be an HTC workload, but one that would run more effectively on a local, dedicated HTC system instead of the OSG

Ideal Jobs!

(1,000s of concurrent jobs)

Still Very

Advantageous!

(100s concurrent jobs)

Less-so, but maybe

Cores

(GPUs)

1

(1; non-specific type)

<8

(1; specific GPU type)

>8 (or MPI)

(multiple)

Walltime

<10 hrs*

*or checkpointable

<20 hrs*

*or checkpointable

>20 hrs

RAM

<few GB

<10s GB

>10s GB

Input

<500 MB

<10 GB

>10 GB

Output

<1 GB

<10 GB

>10 GB

Software

‘portable’ (pre-compiled binaries, transferable, containerizable, etc.)

most other than

Licensed software; non-Linux

18

19 of 68

HTC-Friendly Research Problems*

RNA/DNA sequence alignment

statistical model optimization

parameter sweep

multiple image/sample analysis

*not exhaustive!

DNA by Arafat Uddin from the Noun Project

Image by Shastry from the Noun Project

grid by Nawicon Studio from the Noun Project

Line Graph by Gonzalo Bravo from the Noun Project

19

20 of 68

�High-Throughput BWA Read Mapping

https://datacarpentry.org/

20

21 of 68

Background

  • The goal of this demo is to learn how to convert an existing BWA workflow to run on the OSPool.
  • We’ll be using data from a study of experimental evolution using Escherichia coli (E. coli).
  • This data includes both the genome of E. coli and paired-end RNA sequencing reads obtained from a study carried out by Blount et al. published in PNAS.

21

Details about how the data was modified can be found at https://datacarpentry.org/

22 of 68

Sample BWA Workflow

23 of 68

Quality Control

Align sequenced reads to a reference

Alignment cleanup

Variant Calling

Variant Annotation and Interpretation

Example Next Generation Sequencing Analysis Workflow

BWA (Burrows-Wheeler Aligner)

A software package that maps sequences to a reference file

23

24 of 68

Sample BWA Workflow���

Output

24

PE 1 (Forward Read)

SRR263_1.fastq

PE 2 (Reverse Read)

SRR263_2.fastq

Reference File

ecoli_rel606.fasta.gz

bwa executable

+

+

Input

25 of 68

Sample BWA Workflow���

Output

25

PE 1 (Forward Read)

SRR263_1.fastq

PE 2 (Reverse Read)

SRR263_2.fastq

Reference File

ecoli_rel606.fasta.gz

+

+

Input

Executablebwa-analysis.sh

bwa executable

26 of 68

Sample BWA Workflow���

Output

26

PE 1 (Forward Read)

SRR263_1.fastq

PE 2 (Reverse Read)

SRR263_2.fastq

Reference File

ecoli_rel606.fasta.gz

“Sequences Aligned Map” Output File

SRR263.aligned.sam

+

+

Input

Executablebwa-analysis.sh

bwa executable

27 of 68

Let’s Get Started!

1. Download the BWA tutorial materials

$ cd

$ pwd

$ tutorial bwa-materials

2. Navigate to tutorial-bwa-materials folder

$ cd tutorial-bwa-materials

3. Explore our work environment

$ ls

27

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

software/

Our Workspace

28 of 68

Installing Software

Software/packages/programs only need to be installed once!

  • We have many guides and tutorials:
    • “Using Software on the Open Science Pool”
    • “Use Software Containers”
    • And more!

Many bioinformatics tools are available as ready-to-use Singularity or Docker containers or with Anaconda/conda environments!

  • These containers can be used for your jobs on the OSPool
  • Past workshops: “Using Containerized Software on the Open Science Pool”, more resources on our website.

28

29 of 68

Preparing Software to use in Jobs

  1. Install/setup your software on an OSPool Access Point

  • Tell HTCondor where to find your software
    • Many resources available (e.g. “Compiling Software to run on the OSPool”)

  • Compress your software to make transferring it to and from jobs faster

29

30 of 68

Installing BWA

To learn how to install BWA, let’s go to one of BWA’s manual pages:

From https://github.com/lh3/bwa

30

31 of 68

Steps to install BWA

Image taken from: https://github.com/lh3/bwa

To learn how to install BWA, let’s go to one of BWA’s manual pages:

From https://github.com/lh3/bwa

31

Installing BWA

32 of 68

Overview of Installing BWA

BWA Instillation:

$ cd ~/tutorial-bwa-materials/software

$ git clone https://github.com/lh3/bwa.git

$ cd bwa

$ make

$ export PATH=$PATH:$PWD

Choose a location to install bwa

32

33 of 68

BWA Instillation:

$ cd ~/tutorial-bwa-materials/software

$ git clone https://github.com/lh3/bwa.git

$ cd bwa

$ make

$ export PATH=$PATH:$PWD

Install BWA

Steps taken from BWA manual

Choose a location to install bwa

33

Overview of Installing BWA

34 of 68

BWA Instillation:

$ cd ~/tutorial-bwa-materials/software

$ git clone https://github.com/lh3/bwa.git

$ cd bwa

$ make

$ export PATH=$PATH:$PWD

Install BWA

Steps taken from BWA manual

Tell the system where to find our software

Choose a location to install bwa

34

Overview of Installing BWA

35 of 68

Preparing Software to be Sent in a Job

Once we test our BWA installation, we want to create a compressed tarball of this software so that it is smaller and quicker to transfer to jobs to the OSPool.

$ tar -czvf bwa.tar.gz bwa

Image: https://www.portasouthjetty.com/

articles/workers-clean-up-tar-balls-on-beach/

Tarball (.tar.gz)

35

To do this, navigate to the directory with the bwa executable:

$ cd ~/tutorial-bwa-materials/software/bwa

$ tar -czvf bwa.tar.gz bwa

36 of 68

Analyze a Single Biological Sample with BWA�(Submit a single HTCondor job)

36

37 of 68

37

#!/bin/bash

echo "Unpacking software"

tar -xzf bwa.tar.gz

echo "Setting PATH for bwa"

export PATH=$_CONDOR_SCRATCH_DIR:$PATH

Executable = bwa-analysis.sh

To analyze one sample (SRR263)

38 of 68

38

#!/bin/bash

echo "Unpacking software"

tar -xzf bwa.tar.gz

echo "Setting PATH for bwa"

export PATH=$_CONDOR_SCRATCH_DIR:$PATH

echo "Indexing E. coli genome"

bwa index ecoli_rel606.fasta.gz

echo "Starting bwa alignment"

bwa mem ecoli_rel606.fasta.gz SRR263_1.fastq SRR263_2.fastq > SRR263.aligned.sam

Executable = bwa-analysis.sh

To analyze one sample (SRR263)

39 of 68

39

#!/bin/bash

echo "Unpacking software"

tar -xzf bwa.tar.gz

echo "Setting PATH for bwa"

export PATH=$_CONDOR_SCRATCH_DIR:$PATH

echo "Indexing E. coli genome"

bwa index ecoli_rel606.fasta.gz

echo "Starting bwa alignment"

bwa mem ecoli_rel606.fasta.gz SRR263_1.fastq SRR263_2.fastq > SRR263.aligned.sam

echo "Cleaning up files generated from genome indexing"

rm ecoli_rel606.fasta.gz.amb

rm ecoli_rel606.fasta.gz.ann

rm ecoli_rel606.fasta.gz.bwt

rm ecoli_rel606.fasta.gz.pac

rm ecoli_rel606.fasta.gz.sa

Executable = bwa-analysis.sh

To analyze one sample (SRR263)

40 of 68

Prepare Submit File

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

software/

bwa/

bwa.tar.gz

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

# arguments =

40

Executable = bwa-analysis.sh

41 of 68

Prepare Submit File

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

software/

bwa/

bwa.tar.gz

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

# arguments =

transfer_input_files = software/bwa/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/SRR263_1.fastq, data/fastq/SRR263_2.fastq

Reminder:

Need to transfer bwa.tar.gz file,

the reference genome, and the .fastq files

41

Prepare Submit File

42 of 68

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

software/

bwa/

bwa.tar.gz

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

# arguments =

transfer_input_files = software/bwa/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/SRR263_1.fastq, data/fastq/SRR263_2.fastq

log = log/bwa_test.log

output = log/bwa_test.out

error = log/bwa_test.error

42

Prepare Submit File

43 of 68

Queue One Job

Queue one job to analyze one sample with BWA

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

# arguments =

transfer_input_files = software/bwa/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/SRR263_1.fastq, data/fastq/SRR263_2.fastq

log = log/bwa_test.log

output = log/bwa_test.out

error = log/bwa_test.error

queue 1

43

44 of 68

Queue One Job

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

# arguments =

transfer_input_files = software/bwa/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/SRR263_1.fastq, data/fastq/SRR263_2.fastq

log = log/bwa_test.log

output = log/bwa_test.out

error = log/bwa_test.error

queue 1

44

We are ready to submit, but before we do, let’s think about our BWA output files!

45 of 68

Scaling Up & Keeping Organized

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

bwa_test.log

software/

bwa/

bwa.tar.gz

SRR263.aligned.sam

Current Workspace

45

46 of 68

Scaling Up & Keeping Organized

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

bwa_test.log

software/

bwa/

bwa.tar.gz

SRR263.aligned.sam

Current Workspace

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

...

ref_genome/

ecoli_rel606.fasta.gz

log/

...

software/

bwa/

bwa.tar.gz

results/

SRR263.aligned.sam

...

Desired Workspace

46

47 of 68

Use HTCondor’s Submit File to Organizing Files

Syntax

Purpose

Features

Transfer_output_remaps = “file1.out=path/to/file1.out;

file2.out=path/to/renamedFile2.out”

Used to save output files in a specific path and using a certain name

- Used to save output files to a specific folder

- Used to rename output files to avoid writing over existing files

Must create the path to the folder that you want output files saved to before submitting the job.

47

48 of 68

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

software/

bwa/

bwa.tar.gz

results/

SRR263.aligned.sam

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

# arguments =

transfer_input_files = software/bwa/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/SRR263_1.fastq, data/fastq/SRR263_2.fastq

transfer_output_remaps = “SRR263.aligned.sam =

results/SRR263.aligned.sam”

log = log/bwa_test.log

output = log/bwa_test.out

error = log/bwa_test.error

queue 1

48

Use transfer_output_remaps

Create results/ directory before submitting job

49 of 68

Let’s Analyze One Biological Sample!

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

# arguments =

transfer_input_files = software/bwa/bwa.tar.gz,

data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/SRR263_1.fastq,

data/fastq/SRR263_2.fastq

transfer_output_remaps = “SRR263.aligned.sam = results/SRR263.aligned.sam”

log = log/bwa_test.log

output = log/bwa_test.out

error = log/bwa_test.err

queue 1

Prepare submit file for analyzing one sample with BWA.

49

50 of 68

Analyze a Many Biological Samples using a Single Submit File

50

51 of 68

Queue Multiple Jobs

Syntax

List of Values

Variable Name

queue N

Integers: 0 through N-1

$(ProcID)

queue Var matching pattern*

List of values that match the wildcard pattern.

$(Var)

If no variable name is provided, default is $(Item)

queue Var in (item1 item2 …)

List of values within parentheses.

queue Var from list.txt

List of values from list.txt where each value is on its own line.

51

52 of 68

First, Create the List the Inputs

SRR263

SRR266

SRR244

Make a file called samples.txt containing the names of the texts we want to analyze:

$ pwd

../tutorial-bwa-materials/data/fastq/

$ ls *.fastq | cut -f 1 -d '_' | uniq > samples.txt

52

53 of 68

Submit File to Queue One Job

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

software/

bwa/

bwa.tar.gz

results/

SRR263.aligned.sam

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

# arguments =

transfer_input_files = software/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/SRR263_1.fastq, data/fastq/SRR263_2.fastq

transfer_output_remaps = “SRR263.aligned.sam =

results/SRR263.aligned.sam”

log = log/bwa_test.log

output = log/bwa_test.out

error = log/bwa_test.error

queue 1

53

54 of 68

Edit the Queue Statement to use Variables

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

software/

bwa/

bwa.tar.gz

results/

SRR263.aligned.sam

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

# arguments =

transfer_input_files = software/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/SRR263_1.fastq, data/fastq/SRR263_2.fastq

transfer_output_remaps = “SRR263.aligned.sam =

results/SRR263.aligned.sam”

log = log/bwa_test.log

output = log/bwa_test.out

error = log/bwa_test.error

queue sample from data/fastq/samples.txt

54

55 of 68

Replace Changing Values with Variables

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

software/

bwa/

bwa.tar.gz

results/

SRR263.aligned.sam

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

arguments = $(sample)

transfer_input_files = software/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/$(sample)_1.fastq, data/fastq/$(sample)_2.fastq

transfer_output_remaps = “$(sample).aligned.sam =

results/$(sample).aligned.sam”

log = log/bwa_test.log

output = log/bwa_test.out

error = log/bwa_test.error

queue sample from data/fastq/samples.txt

55

56 of 68

Use Variables with log/error/out Files

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq

SRR263_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

log/

software/

bwa/

bwa.tar.gz

results/

SRR263.aligned.sam

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

arguments = $(sample)

transfer_input_files = software/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/$(sample)_1.fastq, data/fastq/$(sample)_2.fastq

transfer_output_remaps = “$(sample).aligned.sam =

results/$(sample).aligned.sam”

log = log/bwa_$(sample).log

output = log/bwa_$(sample).out

error = log/bwa_$(sample).error

queue sample from data/fastq/samples.txt

56

57 of 68

Edit the Executable to use Variables

We need to edit the executable to use variables so that we can pass different sample names to it as arguments.

57

Currently, running ./bwa-analysis.sh analyses just one sample (SRR263).

We will edit our executable so that we can run:

./bwa-analysis.sh SRR263

./bwa-analysis.sh SRR266

./bwa-analysis.sh SRR244

Sample Name/ID

58 of 68

58

#!/bin/bash

echo "Unpacking software"

tar -xzf bwa.tar.gz

echo "Setting PATH for bwa"

export PATH=$_CONDOR_SCRATCH_DIR:$PATH

echo "Indexing E. coli genome"

bwa index ecoli_rel606.fasta.gz

echo "Starting bwa alignment"

bwa mem ecoli_rel606.fasta.gz SRR263_1.fastq SRR263_2.fastq > SRR263.aligned.sam

echo "Cleaning up files generated from genome indexing"

rm ecoli_rel606.fasta.gz.amb

rm ecoli_rel606.fasta.gz.ann

rm ecoli_rel606.fasta.gz.bwt

rm ecoli_rel606.fasta.gz.pac

rm ecoli_rel606.fasta.gz.sa

Executable = bwa-analysis.sh

59 of 68

59

#!/bin/bash

echo "Unpacking software"

tar -xzf bwa.tar.gz

echo "Setting PATH for bwa"

export PATH=$_CONDOR_SCRATCH_DIR:$PATH

echo "Indexing E. coli genome"

bwa index ecoli_rel606.fasta.gz

echo "Define variable"

sample=$1

echo "Starting bwa alignment"

bwa mem ecoli_rel606.fasta.gz ${sample}_1.fastq ${sample}_2.fastq > ${sample}.aligned.sam

echo "Cleaning up files generated from genome indexing"

rm ecoli_rel606.fasta.gz.amb

rm ecoli_rel606.fasta.gz.ann

rm ecoli_rel606.fasta.gz.bwt

rm ecoli_rel606.fasta.gz.pac

rm ecoli_rel606.fasta.gz.sa

Executable = bwa-analysis.sh

60 of 68

60

#!/bin/bash

echo "Unpacking software"

tar -xzf bwa.tar.gz

echo "Setting PATH for bwa"

export PATH=$_CONDOR_SCRATCH_DIR:$PATH

echo "Indexing E. coli genome"

bwa index ecoli_rel606.fasta.gz

echo "Define variable"

sample=$1

echo "Starting bwa alignment"

bwa mem ecoli_rel606.fasta.gz ${sample}_1.fastq ${sample}_2.fastq > ${sample}.aligned.sam

echo "Cleaning up files generated from genome indexing"

rm ecoli_rel606.fasta.gz.amb

rm ecoli_rel606.fasta.gz.ann

rm ecoli_rel606.fasta.gz.bwt

rm ecoli_rel606.fasta.gz.pac

rm ecoli_rel606.fasta.gz.sa

Executable = bwa-analysis.sh

Let’s make these changes now

61 of 68

Prepare Submit File to Analyze Many Sequencing Files

Prepare submit file for a full workload submission to analyze many .fastq files

# submit file name: bwa-analysis.submit

executable = bwa-analysis.sh

arguments = $(sample)

transfer_input_files = software/bwa.tar.gz, data/ref_genome/ecoli_rel606.fasta.gz, data/fastq/$(sample)_1.fastq, data/fastq/$(sample)_2.fastq

transfer_output_remaps = $(sample).aligned.sam = results/$(sample).aligned.sam”

log = log/bwa_$(sample).log

output = log/bwa_$(sample).out

error = log/bwa_$(sample).error

queue sample from data/fastq/samples.txt

61

62 of 68

Our New Project Directory

bwa-analysis.submit

bwa-analysis.sh

data/

fastq/

SRR263_1.fastq SRR266_1.fastq SRR244_1.fastq

SRR263_2.fastq SRR266_2.fastq SRR244_2.fastq

ref_genome/

ecoli_rel606.fasta.gz

software/

bwa/

bwa.tar.gz

results/

SRR263.aligned.sam SRR266.aligned.sam SRR244.aligned.sam

log/

bwa_SRR263.log bwa_SRR266.log bwa_SRR244.log

bwa_SRR263.err bwa_SRR266.err bwa_SRR244.err

bwa_SRR263.out bwa_SRR266.out bwa_SRR244.out

Organized Workflow

62

63 of 68

Key Takeaways

We have learned how to:

Install software to use in jobs

Convert an existing bioinformatics workflow to run on the OSPool

Keep an organized workflow using HTCondor submit file

options

63

64 of 68

OSG Documentation Website

OSG User Documentation: https://portal.osg-htc.org

Information about:

  • Requesting an account
  • Documentation and resources for getting started using the OSPool
  • Upcoming trainings, resources from past trainings
  • OSG User School information
  • And more!

64

65 of 68

OSG Documentation Website

OSG User Documentation: https://portal.osg-htc.org

Information about:

  • Requesting an account
  • Documentation and resources for getting started using the OSPool
  • Upcoming trainings, resources from past trainings
  • OSG User School information
  • And more!

We also have information on getting started with other bioinformatics tools! (BLAST, SAMtools)

65

66 of 68

Acknowledgements

This material is based upon work supported by the National Science Foundation under Cooperative Agreement OAC-2030508 as part of the PATh Project. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

66

67 of 68

Questions?

67

68 of 68

68