DNA Sequencing Technologies
Data Analysis in Genome Biology
GEN242
1
Thomas Girke
April 10, 2018
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
2
GEN
242
Usually We Prefer to Sequence DNA
3
GEN
242
DNA Libraries
A DNA library consists of cloned DNA fragments that can represent the entire genome of an organism (genomic DNA library) or its transcriptome (cDNA library).
Genomic library
cDNA library
4
GEN
242
Why Is it Helpful to Have Both?
5
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
6
GEN
242
Workflow of a Genomic Sequencing Project
7
Annotation of Functional Features
Submit to GenBank
GEN
242
Synthesis of Common Genomic Libraries
8
Plasmid Library λ Phage Library
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
9
GEN
242
Synthesis of cDNA Library in λ Phage Vector
10
1. mRNA to cDNA 2. cDNA Cloning into λ
GEN
242
Cloning Vectors for Libraries
Plasmid Library [ Genomic & cDNA ]
λ Phage Library [ Genomic & cDNA ]
Cosmid Library [ Genomic ]
BAC Library [ Genomic ]
YAC Library [ Genomic ]
Many Additional Library Types
11
GEN
242
What are EST Sequences?
12
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
13
GEN
242
History of DNA Sequencing
14
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
15
GEN
242
Chemical Sequencing by Maxam & Gilbert
16
Chemical DNA Degradation Gel Electrophoresis
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
17
GEN
242
Illustration of Sanger Sequencing
18
Sequencing Principle
Radioactive Fluorescence
Labeling
GEN
242
Processing of Sequencing Raw Data
19
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
20
GEN
242
Common Synonyms
21
GEN
242
Overview: 454, SOLiD and Illumina
22
From review article: Medini et al 2008
GEN
242
Similarities and Differences of NGS Technologies
Common components
Differences
23
GEN
242
NGS Sequencing Methods
Reversible Terminator Methods (e.g. Illumina/Solexa)
Single Molecule Methods (e.g. Helicos, PacBio)
Pyrosequencing Methods (e.g. 454)
Supported Oligonucleotide Ligation Methods (e.g. SOLiD, Complete Genomics)
24
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
25
GEN
242
Example: Illumina/Solexa Technology
26
Illumina HiSeq 2500 Sequencer
Flow Cell
GEN
242
Basic Steps of Illumina Sequencing
Flow Cell Loading
Sequencing Cycles
27
Compare with illustration on next 3 slides!
GEN
242
Flow Cell Loading
28
Illumina HiSeq 2500 Sequencer
Flow Cell
GEN
242
Sequencing Cycles
29
GEN
242
Details of Sequencing Reaction
30
Illustration shows the sequencing cycles for a single template molecule!
GEN
242
Single End, Paired End and Mate Pair Sequencing
31
Single End
Paired End
Mate Pair
AP1/AP2: flow cell adapators; SP1/SP2: sequencing primers
GEN
242
Paired End Chemistry: Step I
32
Single End
Paired End
Grafted Flow Cell
Cluster Generation: Initial Extension
Linearization: periodate two different enzymes
GEN
242
Paired End Chemistry: Step II
33
Cluster Generation: Amplification
GEN
242
Paired End Chemistry: Step III
34
Cluster Generation: Linearization
GEN
242
Paired End Chemistry: Step IV
35
Sequencing
GEN
242
Processing of Illumina Sequencing Data
A single sequencing run with 2x 100 cycles can generate ∼ 3 billion sequences and 32TB of image data.
36
GEN
242
Sequence Format: FASTQ
@SRR446037.238 length=75
TCAGCCTTGCGACCATACTCCCCCCGGAACCCAAAAACTTTGATTTCTCATAAGGTGCCAGCGGAGTCCTATAAG
+SRR446037.238 length=75
IIIIIIIIIIIGIIHIHIIIIIIIIDHDIIIIIFHDGDEFHCCGHHHHHCDDD@?6?@A@?A;??@2@BA@BBB@
... millions of entries ...
37
FASTQ format has 4 lines per sequence
Example of partial FASTQ file
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
38
GEN
242
Helicos: Single Molecule Sequencing
39
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
40
GEN
242
454/Roche Sequencing Steps
41
Pyrosequencing Methods (e.g. 454)
For more details see: 454 Web Site (http://www.454.com)
GEN
242
454 Pyrosequencing
42
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
43
GEN
242
SOLiD: Sequencing by Supported Oligonucleotide
Ligation and Detection
44
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
45
GEN
242
PacBio: Sequencing with Single Polymerase Molecule
46
Publication: Eid et al 2009
GEN
242
Nanopore Sequencing
47
GEN
242
Comparison of Methods
48
Method | Read Length | Sequences per Run | Utility |
Sanger | 500-1500bp | 384 | de novo and low throughput |
454/Roche | 300-600bp | ~2*106 | de novo and medium throughput |
Pacbio | 0.5-20kbp | ~1-5*106 | de novo and medium throughput |
Illumina | 50-150 (1-2x) | ~1.6*109 | de novo and high-throughput |
All numbers are estimates and apply to the situation in Dec. 2015!
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
49
GEN
242
Applications of NGS Methods
50
NGS technologies provide vast opportunities for genomics, comparative genome biology, medical diagnostics, etc. The following list only a few of examples.
Applications
GEN
242
Application: De Novo Sequencing and Assembly
51
GEN
242
Application: DNA-Protein Interactions with ChIP-Seq
52
Reference for ChIP-Seq data analysis: Jothi et al 2008
GEN
242
Application: Methylome Profiling with BS-Seq
53
GEN
242
Application: RNA-Seq Gene Expression Profiling
54
GEN
242
Application: Digital Gene Expression (DGE) Profiling
55
Sequencing
GEN
242
Targeted Sequencing for Large Genomes
56
Targeted sequencing using DNA capture microarrays or beads
GEN
242
10X Genomics: Linked-Read Sequencing
57
Resolves many challenges inherent to short read sequencing
From Zheng et al (2016)
GEN
242
Database: Sequence Read Archive from NCBI
58
GEN
242
Outline
What Are We Sequencing?
Traditional DNA Sequencing Technologies
Next Generation Sequencing Methods
Research Applications
References and Books
59
GEN
242
References
Albert, T J, Molla, M N, Muzny, D M, Nazareth, L, Wheeler, D, Song, X, Richmond, T A, Middle, C M, Rodesch, M J, Packard, C J, Weinstock, G M, Gibbs, R A (2007) Direct selection of human genomic loci by microarray hybridization. Nat Methods, 4: 903-905. URL http://www.hubmed.org/display.cgi?uids=17934467
Eid, J et al. (2009) Real-time DNA sequencing from single polymerase molecules. Science, 323: 133-138. URL http://www.hubmed.org/display.cgi?uids=19023044
Holt, RA, Jones, SJ (2008) The new paradigm of flow cell sequencing. Genome Res, 18: 839-846. URL http://www.hubmed.org/display.cgi?uids=18519653
Jothi, R, Cuddapah, S, Barski, A, Cui, K, Zhao, K (2008) Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res, 36: 5221-5231. URL http://www.hubmed.org/display.cgi?uids=18684996
Medini, D, Serruto, D, Parkhill, J, … , C, Moxon, R, Falkow, S, Rappuoli, R (2008) Microbiology in the post-genomic era. Nat Rev Microbiol, 6: 419-430. URL http://www.hubmed.org/display.cgi?uids=18475305
60
GEN
242
References
Zheng GXY et al (2016) Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat Biotechnol 34: 303–311
61
GEN
242