Introduction to QIIME
Yoshiki Vázquez-Baeza
University of California, San Diego
What is QIIME?
Getting help with QIIME
Every QIIME script describes the required inputs, expected outputs, and gives usage examples, and are named according to their function. You access the info on a script with the ‘-h’ option, e.g.:
$ count_seqs.py -h
Script index: http://scripts.qiime.org
Forum: http://forum.qiime.org
Additional resources: http://qiime.org/genindex.html
Example files: /path/to/qiime/qiime_test_data/name_of_script/
These options are required for the script to function correctly
These arguments are optional, you can either use them or not, some default values are explained here.
http://scripts.qiime.org
Useful shortcuts for the terminal
halt and kill a command
stop a command (doesn't kill it)
go to the beginning of the line in a terminal window
go to the end of the line in a terminal window
This applies to any operating system.
Magic
to autocomplete
Tutorial
Moving Pictures of the Human Microbiome
Caporaso JG et al. (2011) Moving pictures of the human microbiome. Genome biology 12: R50.
Moving Pictures of the Human Microbiome: QIIME tutorial
Key QIIME files
Mapping File (metadata)
BIOM Table
(OTU counts)
Purple slides mean that you can copy and paste the command.
Note: the commands are separated by ‘\’ characters. This is not required in general, just used here to allow them to be on multiple lines and copy/pastable into the terminal.
Black slides mean that you have to figure out the command on your own.
Getting started
# download the data
wget ftp://ftp.microbio.me/qiime/tutorial_files/moving_pictures_tutorial-1.9.0.tgz
# open the file and go to the illumina folder
tar -xzvf moving_pictures_tutorial-1.9.0.tgz
cd moving_pictures_tutorial-1.9.0/illumina/
Mapping file
Mapping file
= required field
http://qiime.org/documentation/file_formats.html#mapping-file-overview
Validating mapping file
Validate your mapping file:
$ validate_mapping_file.py
Hint: the path to your mapping file is map.tsv
Hint2: use the -o option as well (prevents polluting your directory with lots of output)
Validating a bad mapping file
Validate your mapping file:
$ validate_mapping_file.py
Hint: the path to your mapping file is map-bad.tsv
Hint2: use the -o option as well (prevents polluting your directory with lots of output)
Have Index Reads File | Paired ends | Demultiplexed | Scripts to join and get ready for QIIME |
Yes | Yes | Yes | multiple_join_paired_ends.py then multiple_split_libraries_fastq.py |
Yes | Yes | No | join_paired_ends.py then split_libraries_fastq.py |
Yes | No | Yes | multiple_split_libraries_fastq.py |
Yes | No | No | split_libraries_fastq.py |
No | Yes | Yes | multiple_join_paired_ends.py then multiple_extract_barcodes.py then multiple_split_libraries_fastq.py |
No | Yes | No | join_paired_ends.py then extract_barcodes.py then split_libraries_fastq.py |
No | No | Yes | extract_barcodes.py then multiple_split_libraries_fastq.py |
No | No | No | extract_barcodes.py then split_libraries_fastq.py |
A common setup
Have Index Reads File | Paired ends | Demultiplexed | Scripts to join and get ready for QIIME |
Yes | Yes | Yes | multiple_join_paired_ends.py then multiple_split_libraries_fastq.py |
Yes | Yes | No | join_paired_ends.py then split_libraries_fastq.py |
Yes | No | Yes | multiple_split_libraries_fastq.py |
Yes | No | No | split_libraries_fastq.py |
No | Yes | Yes | multiple_join_paired_ends.py then multiple_extract_barcodes.py then multiple_split_libraries_fastq.py |
No | Yes | No | join_paired_ends.py then extract_barcodes.py then split_libraries_fastq.py |
No | No | Yes | extract_barcodes.py then multiple_split_libraries_fastq.py |
No | No | No | extract_barcodes.py then split_libraries_fastq.py |
Demultiplexing and QCing your reads
$ split_libraries_fastq.py \
-i forward_reads.fastq.gz \
-m map.tsv \
-b barcodes.fastq.gz \
-o slout
Before pick_otus_*.py
Sample name, an underscore and a unique number
Note, this is how files look after demultiplexing and quality control in QIIME
OTU Picking
$ pick_closed_reference_otus.py \
-i slout/seqs.fna \
-o closedref
Note: By default this will use Greengenes reference database clustered at 97%. Other databases can be used.
OTU Picking – Closed Reference
CTGGGCCGTGTCTCAGTCCCAA
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
Experimental Sequences
Reference
Sequences
CTGGGCCGTGTCTCAGTCCCAA
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
Sequences that hit a reference
CTGGGCCGTGTCTCAGTCCCAA
Sequences that failed to hit
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
OTUS
OTU1
OTU1
OTU1
OTU Picking – de-novo
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
Clustered Sequences
OTUS
OTU1
OTU2
OTU3
Clustering Algorithm
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
Experimental Sequences
OTU Picking – Open Reference
CTGGGCCGTGTCTCAGTCCCAA
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
Experimental Sequences
Reference
Sequences
CTGGGCCGTGTCTCAGTCCCAA
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
Sequences that hit a reference
CTGGGCCGTGTCTCAGTCCCAA
Sequences that failed to hit
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
CTGGGCCGTGTCTCAGTCCCAA
TTGGAAGATGTCTCAGTTCCAG
TTGGGCCGTATGTCAGTCCCTA
OTUS
OTU1
OTU2
OTU3
OTU4
OTU5
OTU6
Clustering Algorithm
OTU Table
| taxon1 | taxon2 | taxon3 | taxon4 | taxon5 |
gut1 | 42 | 0 | 37 | 99 | 1 |
left.palm1 | 12 | 1 | 22 | 88 | 0 |
right.palm1 | 25 | 3 | 23 | 86 | 0 |
tongue1 | 0 | 0 | 87 | 12 | 0 |
BIOM format
Summarize your table
Compute the summary of your biom table and print the result to the screen:
$ biom summarize-table
Hint: you need to pass your biom table
Hint 2: use --help
Filtering the table
Remove OTUs with counts less than 25 counts
$ filter_otus_from_otu_table.py
Note: to be able to copy/paste commands later in the tutorial, name the output file filtered-table.biom
Filtering the table
You can verify the filtering by using:
$ biom summarize-table
Hint:you will need --observations
Hint:you may want to pipe to less
Diversity Analysis
… computing alpha and beta diversity and statistical treatments of the data
Rarefy Your Data
Rarefy your filtered OTU table
$ single_rarefaction.py \
-i filtered-table.biom \
-o filtered-table.even1000.biom \
-d 1000
You can verify it by using:
$ biom summarize-table
What’s in the samples?
Summarize taxa using the following command:
$ summarize_taxa_through_plots.py
Hint: you only need the required options for this task, so read only their descriptions.
Measuring Alpha Diversity
$ alpha_diversity.py \
-i filtered-table.even1000.biom \
-o alpha.txt \
-t closedref/97_otus.tree
Hint: You could add other metrics with -m and see the list of metrics with -s
Add Diversity Information to Your Mapping File
$ add_alpha_to_mapping_file.py \
-i alpha.txt \
-m map.tsv \
-o map.alpha.tsv
beta diversity through plots
$ beta_diversity_through_plots.py
Hint: You’ll need to pass the flag --color_by_all_fields
Hint2: you need to pass the tree.
Hint3: You’ll likely see a RuntimeWarning, ignore it
To view the plot
Download the output … or alternatively go to these links:
http://emperor.microbio.me/hu-cfar/unweighted_unifrac_emperor_pcoa_plot/
http://emperor.microbio.me/hu-cfar/weighted_unifrac_emperor_pcoa_plot/
Ordinations
Bray, J Roger, and John T Curtis. "An ordination of the upland forest communities of southern Wisconsin." Ecological monographs 27.4 (1957): 325-349.
PERMANOVA and ANOSIM
Assess statistical significance.
Within cluster distance to between cluster distance ratio.
rb is the mean distance between groups and rw is the mean distance within groups.
PERMANOVA
ANOSIM
compare_categories.py
One script to rule them all
core_diversity_analyses.py
Modified from http://bio.sacnas.org/uploads/Judges/we_need_you.jpg
QIIME Forum
forum
QIIME Forum
Questions?
Commands
validate_mapping_file.py -m bad-map.tsv -o bad-validated
validate_mapping_file.py -m map.tsv -o validated
split_libraries_fastq.py -i forward_reads.fastq.gz -m map.tsv -b barcodes.fastq.gz -o slout
pick_closed_reference_otus.py -i slout/seqs.fna -o closedref
biom summarize-table -i closedref/otu_table.biom
filter_otus_from_otu_table.py -i closedref/otu_table.biom -o filtered-table.biom -n 25
biom summarize-table -i filtered-table.biom --observations
single_rarefaction.py -i filtered-table.biom -o filtered-table.even1000.biom -d 1000
summarize_taxa_through_plots.py -i filtered-table.even1000.biom -o summaries
alpha_diversity.py -i filtered-table.even1000.biom -o alpha.txt -t closedref/97_otus.tree
add_alpha_to_mapping_file.py -i alpha.txt -m map.tsv -o map.alpha.tsv
beta_diversity_through_plots.py -i filtered-table.even1000.biom -m map.alpha.tsv -t closedref/97_otus.tree --color_by_all_fields -o beta
Acknowledgments
License and contact information
(read this if you’re interested in re-using these slides)
This work is licensed under the Creative Commons Attribution 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Feel free to use or modify these slides, but please credit the QIIME developers by placing the following attribution information where you feel that it makes sense: QIIME 2, https://qiime2.org.
These slides were created and arranged by Greg Caporaso, Antonio Gonzalez, and other members of the QIIME development group.
For more bioinformatics educational content, see An Introduction to Applied Bioinformatics (IAB) and Dr. Caporaso’s teaching and lab websites. For updates on IAB, scitkit-bio, QIIME, and related projects, follow @gregcaporaso and @KnightLabNews on Twitter.