Introduction to the Human Genome
Applied Computational Genomics, Lecture 03
https://github.com/quinlan-lab/applied-computational-genomics
Aaron Quinlan
Departments of Human Genetics and Biomedical Informatics
USTAR Center for Genetic Discovery
University of Utah
quinlanlab.org
http://labiotech.eu/history-of-biotech-25-years-of-the-human-genome-project/
First genetic map
UU graduate review committee at Alta
1952 - Hershey/Chase
1953 - DNA structure
1978 - Alta meeting - use "markers" to map disease genes
The early concept of "linkage mapping"
Mark Skolnick (U. of Utah)
Kerry Kravitz
(grad student w Skolnick)
David Botstein
1980 - Idea: use RFLPs to build a human linkage map
Ray Gesteland, PhD and Ray White, PhD, Eccles Institute of Human Genetics Building Construction (1989)
http://labiotech.eu/history-of-biotech-25-years-of-the-human-genome-project/
1983 - Wexler and Gusella map Huntingtin using markers
http://labiotech.eu/history-of-biotech-25-years-of-the-human-genome-project/
1985 - PCR invented by Kary Mullis
How many genomes exist in a human cell?
One nuclear genome - *most of the time
https://www.khanacademy.org/science/biology/structure-of-a-cell/prokaryotic-and-eukaryotic-cells/a/intro-to-eukaryotic-cells
Also many, many mitochondrial genomes
Corticospinal neurons can be several feet in length. Synapses have high ATP demands - many mitochondria!
The scale of DNA in our body is staggering.
♂
♀
ATCGGGTACCATCCAATCATTACC
Humans are diploid.
ATCGGGAACCATCCAATCATTACC
♂
♀
Our genome is comprised of a paternal and a maternal "haplotype". Together, they form our "genotype"
Our genome: mini quiz
How many distinct chromosomes are there in the human genome?
24: the autosomes (chromosome 1-22), sex chromosomes (X, Y)
How many chromosomes exist in a (typical) haploid human genome ?
23: the autosomes (chromosome 1-22) and one sex chromosomes (X or Y)
How many chromosomes exist in a (typical) diploid human genome ?
46: two haploid genomes - one from mother and one from father
The human genome
from a macro to micro scale
The human genome
from a macro to micro scale
https://micro.magnet.fsu.edu/cells/nucleus/images/chromatinstructurefigure1.jpg
The human genome - basic stats
http://uswest.ensembl.org/Homo_sapiens/Location/Genome
The human karyotype
Parental haploid copy 1
Parental haploid copy 2
Male
The basic structure of a chromosome
The role of the centromere.
Centromeres are required for chromosome separation during cell division. The centromeres are attachment points for microtubules, which are protein fibers that pull duplicate chromosomes toward opposite ends of the cell before it divides. This separation ensures that each daughter cell will have a full set of chromosomes.
Telomeres
http://learn.genetics.utah.edu/content/basics/readchromosomes/
Centromere positioning
http://learn.genetics.utah.edu/content/basics/readchromosomes/
Centromere position can be described three ways: metacentric, submetacentric or acrocentric.
In metacentric chromosomes, the centromere lies near the center of the chromosome. Human chromosomes 1, 3,16, 19, and 20 are metacentric
Submetacentric chromosomes have a centromere that is off-center, so that one chromosome arm is longer than the other. The short arm is designated "p" (for petite), and the long arm is designated "q" (because it follows the letter "p"). Human chromosomes 2, 4-12, 17, 18, and X are submetacentric.
In acrocentric chromosomes, the centromere is very near one end. Human chromosomes 13, 14, 15, 21, 22, and Y are acrocentric.
Chromosome Giemsa banding (G-banding)
https://en.wikipedia.org/wiki/G_banding, https://ghr.nlm.nih.gov/chromosome/1#ideogram
Sequencing a reference human genome. Not the human genome.
Why sequence a reference genome?
Sequencing the first human genome: Sanger method
Key points:
1) sequencing by synthesis (not degradation)
2) radioactive primers hybridize to DNA
3) polymerase + dNTPS + ddNTP terminators at low concentration
4) Add one ddNTP base per reaction/lane, visually interpret ladder
Strengths over chemical sequencing (Gilbert):
1) easier & faster
2) no nasty chemicals
How to sequence a human genome: Lee Hood automation
before
after
read lengths: ~500bp
Sanger sequencing: technological advances
1977: Fred Sanger
1985: ABI 370 (first automated sequencer)
1 hardworking technician
= 700 bases per day
= 118,000 years to sequence the human genome
5000 bases per day
= 16,000 years
1995: ABI 377 (Bigger gels, better chemistry & optics, more sensitive dyes, faster computers)
19,000 bases per day
= 4,400 years
1999: ABI 3700 (96 capillaries, 96 well plates, fluid handling robots)
400,000 bases per day
= 205 years
Shotgun genome sequencing (Sanger, 1979)
1) Fragment the genome (or large BAC clones)
2) Clone 2-10kb fragments into plasmids; pick lots of colonies; purify DNA from each
3) use a primer to plasmid to sequence into genomic DNA
4) assemble the genome from overlapping “reads”
ATCGCCGTACTAGCGAGCTTGCGAT
GCTTGCGATAACGCTTCCGTCGAGCCGTAAATCGGCTCGAG
TCGGCTCGAGAAGCTGCTTGCGAAAGCTGT
ATCGCCGTACTAGCGAGCTTGCGATAACGCTTCCGTCGAGCCGTAAATCGGCTCGAGAAGCTGCTTGCGAAAGCTGT
1977: Bacteriophage fX 174 (5kb)
1995: H. Influenza (1Mb);
1996: Yeast (12mb);
2000: Drosophila (165Mb);
2002: Human (3Gb)
The competing human genome projects (this was war)
Public (Universities)
1990-2001 (2003)
3 billion dollars
Celera Corporation
1999-2001 (2003)
300 million dollars
ca. 150kb segments amplified in BACs
Note: "Much of the sequence (>70%) of the reference genome produced by the public HGP came from a single anonymous male donor from Buffalo, New York (RP11)."
https://en.wikipedia.org/wiki/Human_Genome_Project
The competing human genome projects (this was war)
A first map of the human genome
http://www.nature.com/nature/journal/v409/n6822/full/409860a0.html
A first map of the human genome ("build 1")
http://www.nature.com/nature/journal/v409/n6822/full/409860a0.html
CGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTCTTTTCGTTAACTTGCCGTCAGCCTTTTCTTTGACCTCTTCTTTCTGTTCATGTGTATTTGCTGTCTCTTAGCCAGACTTCCCGTGTCCTTTCCACCGGGCCTTTGAGAGGTCACAGGGTCTTGATGCTGTGGTCTTCATCTGCAGGTGTCTGACTTCCAGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCACAACCAGGCATAGGGGAAAGATTGGAGGAAAGATGAGTGAGAGCATCAACTTCTCTCACAACCTAGGCCAGTAAGTAGTGCTTGTGCTCATCTCCTTGGCTGTGATACGTGGCCGGCCCTCGCTCCAGCAGCTGGACCCCTACCTGCCGTCTGCTGCCATCGGAGCCCAAAGCCGGGCTGTGACTGCTCAGACCAGCCGGCTGGAGGGAGGGGCTCAGCAGGTCTGGCTTTGGCCCTGGGAGAGCAGGTGGAAGATCAGGCAGGCCATCGCTGCCACAGAACCCAGTGGATTGGCCTAGGTGGGATCTCTGAGCTCAACAAGCCCTCTCTGGGTGGTAGGTGCAGAGACGGGAGGGGCAGAGCCGCAGGCACAGCCAAGAGGGCTGAAGAAATGGTAGAACGGAGCAGCTGGTGATGTGTGGGCCCACCGGCCCCAGGCTCCTGTCTCCCCCCAGGTGTGTGGTGATGCCAGGCATGCCCTTCCCCAGCATCAGGTCTCCAGAGCTGCAGAAGACGACGGCCGACTTGGATCACACTCTTGTGAGTGTCCCCAGTGTTGCAGAGGTGAGAGGAGAGTAGACAGTGAGTGGGAGTGGCGTCGCCCCTAGGGCTCTACGGGGCCGGCGTCTCCTGTCTCCTGGAGAGGCTTCGATGCCCCTCCACACCCTCTTGATCTTCCCTGTGATGTCATCTGGAGCCCTGCTGCTTGCGGTGGCCTATAAAGCCTCCTAGTCTGGCTCCAAGGCCTGGCAGAGTCTTTCCCAGGGAAAGCTACAAGCAGCAAACAGTCTGCATGGGTCATCCCCTTCACTCCCAGCTCAGAGCCCAGGCCAGGGGCCCCCAAGAAAGGCTCTGGTGGAGAACCTGTGCATGAAGGCTGTCAACCAGTCCATAGGCAAGCCTGGCTGCCTCCAGCTGGGTCGACAGACAGGGGCTGGAGAAGGGGAGAAGAGGAAAGTGAGGTTGCCTGCCCTGTCTCCTACCTGAGGCTGAGGAAGGAGAAGGGGATGCACTGTTGGGGAGGCAGCTGTAACTCAAAGCCTTAGCCTCTGTTCCCACGAAGGCAGGGCCATCAGGCACCAAAGGGATTCTGCCAGCATAGTGCTCCTGGACCAGTGATACACCCGGCACCCTGTCCTGGACACGCTGTTGGCCTGGATCTGAGCCCTGGTGGAGGTCAAAGCCACCTTTGGTTCTGCCATTGCTGCTGTGTGGAAGTTCACTCCTGCCTTTTCCTTTCCCTAGAGCCTCCACCACCCCGAGATCACATTTCTCACTGCCTTTTGTCTGCCCAGTTTCACCAGAAGTAGGCCTCTTCCTGACAGGCAGCTGCACCACTGCCTGGCGCTGTGCCCTTCCTTTGCTCTGCCCGCTGGAGACGGTGTTTGTCATGGGCCTGGTCTGCAGGGATCCTGCTACAAAGGTGAAACCCAGGAGAGTGTGGAGTCCAGAGTGTTGCCAGGACCCAGGCACAGGCATTAGTGCCCGTTGGAGAAAACAGGGGAATCCCGAAGAAATGGTGGGTCCTGGCCATCCGTGAGATCTTCCCAGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACAGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAGAAGTCCCCGCCCCAGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCAGTCGTCCTCGTCCTCCTCTGCCTGTGGCTGCTGCGGTGGCGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCTGGAGGGAAAAGGCTGAGTGAGGGTGGTTGGTGGGAAACCCTGGTTCCCCCAGCCCCCGGAGACTTAAATACAGGAAGAAAAAGGCAGGACAGAATTACAAGGTGCTGGCCCAGGGCGGGCAGCGGCCCTGCCTCCTACCCTTGCGCCTCATGACCGGAGCCATAGCCCAGGCAGGAGGGCTGAGGACCTCTGGTGGCGGCCCAGGGCTTCCAGCATGTGCCCTAGGGGAAGCAGGGGCCAGCTGGCAAGAGCAGGGGGTGGGCAGAAAGCACCCGGTGGACTCAGGGCTGGAGGGGAGGAGGCGATCTTGCCCAAGGCCCTCCGACTGCAAGCTCCAGGGCCCGCTCACCTTGCTCCTGCTCCTTCTGCTGCTGCTTCTCCAGCTTTCGCTCCTTCATGCTGCGCAGCTTGGCCTTGCCGATGCCCCCAGCTTGGCGGATGGACTCTAGCAGAGTGGCCAGCCACCGGAGGGGTCAACCACTTCCC
Gene content
http://www.nature.com/nature/journal/v409/n6822/full/409860a0.html
"There appear to be about 30,000-40,000 protein-coding genes in the human genome -- only about twice as many as in worm or fly. However, the genes are more complex, with more alternative splicing generating a larger number of protein products." (Over time this has evolved to an estimate of approximately 20,000 protein coding genes, which reflects roughly the number of genes in fly and worm)
Solely 2% of the human genome encodes proteins.
https://genome.ucsc.edu
Half of the human genome is comprised of repeats
http://www.nature.com/nrg/journal/v10/n10/pdf/nrg2640.pdf
Retrotransposons use a "copy/paste" mechanism
DNA transposons use a "cut/paste" mechanism
McClintock's "jumping
genes" in maize
Half of the human genome is comprised of repeats
http://www.nature.com/nrg/journal/v10/n10/pdf/nrg2640.pdf
Repetitive DNA not driven by retrotransposition (e.g., ATATATATATATATATAT…)
GC content varies dramatically in the genome
http://www.nature.com/nrg/journal/v10/n10/pdf/nrg2640.pdf
Region from chromosome 1
GC content
Each point is 20kb
Each point is 2kb
Each point is 200 bp
Why are there no points here?
CpG islands - clusters of CG dinucleotides
(The "p" represents the phosphate bond between the nucleotides on the same strand. Needed to distinguish between hydrogen bond between C and G on complementary DNA strands
http://www.nature.com/nrg/journal/v10/n10/pdf/nrg2640.pdf
http://missinglink.ucsf.edu/lm/genes_and_genomes/methylation.html
ATGTCGTAATCTCGAA
m
Methylated cytosine
Unmethylated cytosine
CpG island content throughout the genome
http://www.nature.com/nrg/journal/v10/n10/pdf/nrg2640.pdf
Chromosome 19 is the most gene dense chromosome in the human genome
The human reference genome continues to change.