Майнор по биоинформатике
Мария Попцова
Лекция 1
Семестр 2
Human genome
mitochondria
The Human Genome Project
Read more: https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genome-project
Read more: https://www.genome.gov/about-genomics/educational-resources/fact-sheets/human-genome-project
Most of the original human genome sequence came from volunteers living in Buffalo, New York. Researchers at the Roswell Park Cancer Institute, located in Buffalo, were experts at preparing the DNA in a form that could be used for sequencing the human genome
Most of the original human genome sequence came from volunteers living in Buffalo, New York. Researchers at the Roswell Park Cancer Institute, located in Buffalo, were experts at preparing the DNA in a form that could be used for sequencing the human genome
Human Genome Project
Sanger Sequencing
https://www.youtube.com/watch?v=KTstRrDTmWI
Nitrogenous Bases
Nucleosides
Nucleotides
Phosphodiester Bond
Sanger sequencing �Chain termination or dideoxy method�
1
4
3
2
Gel electrophoresis
5
Dideoxy (Sanger) Method
Dideoxy Method
A sequencing gel
This picture is a radiograph. The dark color of the lines is
proportional to the radioactivity from 32P labeled adenonsine
in the transcribed DNA sample.
Automated Version of the Dideoxy Method
http://www.youtube.com/watch?v=JHv7IxxgxW4
Vol 318, Issue 5858�21 December 2007
Equipped with faster, cheaper technologies for sequencing DNA and assessing variation in genomes on scales ranging from one to millions of bases, researchers are finding out how truly different we are from one another.
Single Nucleotide Polymorphism SNPs
Human genetics: terminology
Slide courtesy of Sven Cichon
If allele G is associated with risk for disease, it is the risk allele.
That makes allele A the protective allele.
Slide courtesy of Sven Cichon
Human genetics: terminology
SNPs may / may not alter protein structure
SNPs act as gene markers
SNP Maps
Structural Variation
What is next-generation sequencing?
How to find genetic variation with next generation sequencing
(Meyerson, Nat Review Genet, 2010)
How many variants exist in each individual?
European
Asian
South American/Hispanic
African
Each individual has 4-5 million variants different from reference genome
>99.9% of those variants are SNPs or short indels
(1000 Genomes Project Consortium, Nature, 2015)
Frequency of variants
(1000 Genomes Project Consortium, Nature, 2015)
1000 genomes project
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
2015
100 000 genomes project
1 million
A typical genome
The majority of variants in the data set are rare
Putatively functional variation�
Single Nucleotide Polymorphism
C T T A G C T T
C T T A G T T T
SNP
C T T A G C T T
C T T A G T T T
Mutation
94%
6%
99.9%
0.1%
From Kun-Mao Chao, National Taiwan University
Mutations and SNPs
42
Common Ancestor
time
present
Observed genetic variations
Mutations
SNPs
From Kun-Mao Chao, National Taiwan University
Single Nucleotide Polymorphism
43
From Kun-Mao Chao, National Taiwan University
Single Nucleotide Polymorphism
A C T T A G C T T
A C T T A G C T C
C: Minor allele
94%
6%
T: Major allele
From Kun-Mao Chao, National Taiwan University
Haplotypes
45
SNP1
SNP2
SNP3
-A C T T A G C T T-
-A A T T T G C T C-
-A C T T T G C T C-
Haplotype 2
Haplotype 3
C A T
A T C
C T C
Haplotype 1
SNP1
SNP2
SNP3
From Kun-Mao Chao, National Taiwan University
Haplotypes
a set of linked single-nucleotide polymorphism (SNP) alleles that tend to always occur together
Genotype