1 of 50

Introduction to QIIME

Yoshiki Vázquez-Baeza

University of California, San Diego

2 of 50

What is QIIME?

  • Set of scripts and tools that do data processing, microbial ecology statistics, and produce publication quality plots
  • All scripts run from the command line
  • Integration of >25 other programs
  • Common sets of steps in ‘workflow’ scripts

3 of 50

Getting help with QIIME

Every QIIME script describes the required inputs, expected outputs, and gives usage examples, and are named according to their function. You access the info on a script with the ‘-h’ option, e.g.:

$ count_seqs.py -h

Script index: http://scripts.qiime.org

Forum: http://forum.qiime.org

Additional resources: http://qiime.org/genindex.html

Example files: /path/to/qiime/qiime_test_data/name_of_script/

4 of 50

5 of 50

These options are required for the script to function correctly

These arguments are optional, you can either use them or not, some default values are explained here.

http://scripts.qiime.org

6 of 50

Useful shortcuts for the terminal

halt and kill a command

stop a command (doesn't kill it)

go to the beginning of the line in a terminal window

go to the end of the line in a terminal window

This applies to any operating system.

7 of 50

Magic

to autocomplete

8 of 50

Tutorial

9 of 50

Moving Pictures of the Human Microbiome

  • Two subjects sampled daily, one for six months, one for 18 months
  • Four body sites: tongue, palm of left hand, palm of right hand, and gut (via fecal swabs).

Caporaso JG et al. (2011) Moving pictures of the human microbiome. Genome biology 12: R50.

10 of 50

Moving Pictures of the Human Microbiome: QIIME tutorial

  • A small subset of the full data set to facilitate short run time: ~0.1% of the full sequence collection.
  • Sequenced across six Illumina GAIIx lanes, with a subset of the samples also sequenced on 454.

11 of 50

12 of 50

Key QIIME files

Mapping File (metadata)

BIOM Table

(OTU counts)

13 of 50

Purple slides mean that you can copy and paste the command.

Note: the commands are separated by ‘\’ characters. This is not required in general, just used here to allow them to be on multiple lines and copy/pastable into the terminal.

Black slides mean that you have to figure out the command on your own.

14 of 50

Getting started

# download the data

wget ftp://ftp.microbio.me/qiime/tutorial_files/moving_pictures_tutorial-1.9.0.tgz

# open the file and go to the illumina folder

tar -xzvf moving_pictures_tutorial-1.9.0.tgz

cd moving_pictures_tutorial-1.9.0/illumina/

15 of 50

Mapping file

16 of 50

Mapping file

= required field

http://qiime.org/documentation/file_formats.html#mapping-file-overview

17 of 50

Validating mapping file

Validate your mapping file:

$ validate_mapping_file.py

Hint: the path to your mapping file is map.tsv

Hint2: use the -o option as well (prevents polluting your directory with lots of output)

18 of 50

Validating a bad mapping file

Validate your mapping file:

$ validate_mapping_file.py

Hint: the path to your mapping file is map-bad.tsv

Hint2: use the -o option as well (prevents polluting your directory with lots of output)

19 of 50

Have Index Reads File

Paired ends

Demultiplexed

Scripts to join and get ready for QIIME

Yes

Yes

Yes

multiple_join_paired_ends.py then multiple_split_libraries_fastq.py

Yes

Yes

No

join_paired_ends.py then split_libraries_fastq.py

Yes

No

Yes

multiple_split_libraries_fastq.py

Yes

No

No

split_libraries_fastq.py

No

Yes

Yes

multiple_join_paired_ends.py then

multiple_extract_barcodes.py then

multiple_split_libraries_fastq.py

No

Yes

No

join_paired_ends.py then

extract_barcodes.py then

split_libraries_fastq.py

No

No

Yes

extract_barcodes.py then

multiple_split_libraries_fastq.py

No

No

No

extract_barcodes.py then

split_libraries_fastq.py

20 of 50

A common setup

21 of 50

Have Index Reads File

Paired ends

Demultiplexed

Scripts to join and get ready for QIIME

Yes

Yes

Yes

multiple_join_paired_ends.py then multiple_split_libraries_fastq.py

Yes

Yes

No

join_paired_ends.py then split_libraries_fastq.py

Yes

No

Yes

multiple_split_libraries_fastq.py

Yes

No

No

split_libraries_fastq.py

No

Yes

Yes

multiple_join_paired_ends.py then

multiple_extract_barcodes.py then

multiple_split_libraries_fastq.py

No

Yes

No

join_paired_ends.py then

extract_barcodes.py then

split_libraries_fastq.py

No

No

Yes

extract_barcodes.py then

multiple_split_libraries_fastq.py

No

No

No

extract_barcodes.py then

split_libraries_fastq.py

22 of 50

Demultiplexing and QCing your reads

$ split_libraries_fastq.py \

-i forward_reads.fastq.gz \

-m map.tsv \

-b barcodes.fastq.gz \

-o slout

23 of 50

Before pick_otus_*.py

Sample name, an underscore and a unique number

Note, this is how files look after demultiplexing and quality control in QIIME

24 of 50

OTU Picking

$ pick_closed_reference_otus.py \

-i slout/seqs.fna \

-o closedref

Note: By default this will use Greengenes reference database clustered at 97%. Other databases can be used.

25 of 50

OTU Picking – Closed Reference

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

Experimental Sequences

Reference

Sequences

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

Sequences that hit a reference

CTGGGCCGTGTCTCAGTCCCAA

Sequences that failed to hit

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

OTUS

OTU1

OTU1

OTU1

26 of 50

OTU Picking – de-novo

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

Clustered Sequences

OTUS

OTU1

OTU2

OTU3

Clustering Algorithm

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

Experimental Sequences

27 of 50

OTU Picking – Open Reference

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

Experimental Sequences

Reference

Sequences

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

Sequences that hit a reference

CTGGGCCGTGTCTCAGTCCCAA

Sequences that failed to hit

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

CTGGGCCGTGTCTCAGTCCCAA

TTGGAAGATGTCTCAGTTCCAG

TTGGGCCGTATGTCAGTCCCTA

OTUS

OTU1

OTU2

OTU3

OTU4

OTU5

OTU6

Clustering Algorithm

28 of 50

OTU Table

taxon1

taxon2

taxon3

taxon4

taxon5

gut1

42

0

37

99

1

left.palm1

12

1

22

88

0

right.palm1

25

3

23

86

0

tongue1

0

0

87

12

0

29 of 50

BIOM format

  • HDF5 files are optimized for large datasets.
  • Generic contingency table format.
  • It has a Python and an R interface.

30 of 50

Summarize your table

Compute the summary of your biom table and print the result to the screen:

$ biom summarize-table

Hint: you need to pass your biom table

Hint 2: use --help

31 of 50

Filtering the table

Remove OTUs with counts less than 25 counts

$ filter_otus_from_otu_table.py

Note: to be able to copy/paste commands later in the tutorial, name the output file filtered-table.biom

32 of 50

Filtering the table

You can verify the filtering by using:

$ biom summarize-table

Hint:you will need --observations

Hint:you may want to pipe to less

33 of 50

Diversity Analysis

… computing alpha and beta diversity and statistical treatments of the data

34 of 50

Rarefy Your Data

Rarefy your filtered OTU table

$ single_rarefaction.py \

-i filtered-table.biom \

-o filtered-table.even1000.biom \

-d 1000

You can verify it by using:

$ biom summarize-table

35 of 50

What’s in the samples?

Summarize taxa using the following command:

$ summarize_taxa_through_plots.py

Hint: you only need the required options for this task, so read only their descriptions.

36 of 50

Measuring Alpha Diversity

$ alpha_diversity.py \

-i filtered-table.even1000.biom \

-o alpha.txt \

-t closedref/97_otus.tree

Hint: You could add other metrics with -m and see the list of metrics with -s

37 of 50

Add Diversity Information to Your Mapping File

$ add_alpha_to_mapping_file.py \

-i alpha.txt \

-m map.tsv \

-o map.alpha.tsv

38 of 50

beta diversity through plots

$ beta_diversity_through_plots.py

Hint: You’ll need to pass the flag --color_by_all_fields

Hint2: you need to pass the tree.

Hint3: You’ll likely see a RuntimeWarning, ignore it

39 of 50

To view the plot

40 of 50

Ordinations

Bray, J Roger, and John T Curtis. "An ordination of the upland forest communities of southern Wisconsin." Ecological monographs 27.4 (1957): 325-349.

41 of 50

42 of 50

PERMANOVA and ANOSIM

Assess statistical significance.

Within cluster distance to between cluster distance ratio.

rb is the mean distance between groups and rw is the mean distance within groups.

PERMANOVA

ANOSIM

compare_categories.py

43 of 50

One script to rule them all

core_diversity_analyses.py

  • Rarefaction
  • Alpha diversity
  • Beta diversity
  • Taxonomic summaries
  • Statistical testing

44 of 50

45 of 50

QIIME Forum

Search the forum

http://forum.qiime.org

We try to answer within 24 hours

forum

46 of 50

QIIME Forum

47 of 50

Questions?

48 of 50

Commands

validate_mapping_file.py -m bad-map.tsv -o bad-validated

validate_mapping_file.py -m map.tsv -o validated

split_libraries_fastq.py -i forward_reads.fastq.gz -m map.tsv -b barcodes.fastq.gz -o slout

pick_closed_reference_otus.py -i slout/seqs.fna -o closedref

biom summarize-table -i closedref/otu_table.biom

filter_otus_from_otu_table.py -i closedref/otu_table.biom -o filtered-table.biom -n 25

biom summarize-table -i filtered-table.biom --observations

single_rarefaction.py -i filtered-table.biom -o filtered-table.even1000.biom -d 1000

summarize_taxa_through_plots.py -i filtered-table.even1000.biom -o summaries

alpha_diversity.py -i filtered-table.even1000.biom -o alpha.txt -t closedref/97_otus.tree

add_alpha_to_mapping_file.py -i alpha.txt -m map.tsv -o map.alpha.tsv

beta_diversity_through_plots.py -i filtered-table.even1000.biom -m map.alpha.tsv -t closedref/97_otus.tree --color_by_all_fields -o beta

49 of 50

Acknowledgments

  • Will VanTreuren
  • John Chase
  • Jesse Stombough
  • Justin Kuczynski
  • Nicholas Bokulich
  • The Knight Laboratory
  • The Caporaso Laboratory
  • Team of QIIME Developers
  • Many others
  • Rob Knight
  • Kyle Bittinger
  • Antonio Gonzalez
  • Justine Debelius
  • Luke Ursell
  • Jose Clemente
  • Daniel McDonald
  • Greg Caporaso
  • Yoshiki Vazquez-Baeza
  • Jackson Chen

50 of 50

License and contact information

(read this if you’re interested in re-using these slides)

This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

Feel free to use or modify these slides, but please credit the QIIME developers by placing the following attribution information where you feel that it makes sense: QIIME 2, https://qiime2.org.

These slides were created and arranged by Greg Caporaso, Antonio Gonzalez, and other members of the QIIME development group.

For more bioinformatics educational content, see An Introduction to Applied Bioinformatics (IAB) and Dr. Caporaso’s teaching and lab websites. For updates on IAB, scitkit-bio, QIIME, and related projects, follow @gregcaporaso and @KnightLabNews on Twitter.