FoCS Breadth:
Overview of Bioinformatics
Niema Moshiri
UC San Diego SPIS 2022
What is Bioinformatics?
What is Bioinformatics?
What is Bioinformatics?
What is Bioinformatics?
My Definition of Bioinformatics
My Definition of Bioinformatics
Biology
My Definition of Bioinformatics
Biology
Computer
Science
My Definition of Bioinformatics
Biology
Computer
Science
My Definition of Bioinformatics
Biology
Computer
Science
Bioinformatics
My Definition of Bioinformatics
Biology
Computer
Science
Computational Biology
Bioinformatics
My Definition of Bioinformatics
Biology
Computer
Science
Computational Biology
Bioinformatics
Chemistry? Physics? Statistics?
The Central Dogma
The Central Dogma of Biology
DNA
The Central Dogma of Biology
DNA
RNA
Transcription
The Central Dogma of Biology
DNA
RNA
Protein
Transcription
Translation
The Central Dogma of Biology
DNA
RNA
Protein
Transcription
Translation
Replication
Reverse Transcription
Replication
The Central Dogma of Biology
DNA
RNA
Protein
Transcription
Translation
Transcription
Transcription
DNA
RNA
Transcription
Transcription
Transcription
Transcription
Transcription
TF
Transcription
TF
Pol
Transcription
TF
Pol
Transcription
TF
Pol
Transcription
TF
Pol
Transcription
TF
Pol
Transcription
TF
Pol
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Transcription
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Transcription
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Transcription
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Transcription
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Transcription
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Transcription
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Transcription
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Transcription
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Transcription
AAAAAA
Transcription: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Translation
Translation
RNA
Protein
Translation
Translation
mRNA
Protein
Translation
Translation
Translation
Translation
Translation: Mechanism
Translation: Mechanism
Translation: Mechanism
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein:
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: M
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MA
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MAT
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MATT
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MATTH
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MATTHI
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MATTHIA
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MATTHIAS
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MATTHIAS
Translation: Summary
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MATTHIAS
Protein Structure
The Central Dogma: Summary
DNA: GAGCTGATGGCTACTACACATATTGCCAGTTGATGGGTT
RNA: GAGCUGAUGGCUACUACACAUAUUGCCAGUUGAUGGGUU
Protein: MATTHIAS
Transcription
Translation
Natural Selection
Natural Selection
Natural Selection
Natural Selection
Natural Selection
Natural Selection
Natural Selection: Example
Generation 0
6
5
2
Natural Selection: Example
Generation 0
6
5
2
Natural Selection: Example
Generation 1
5
5
3
Natural Selection: Example
Generation 1
5
5
3
Natural Selection: Example
Generation 2
3
6
4
Natural Selection: Example
Generation 2
3
6
4
Natural Selection: Example
Generation 3
2
5
6
Natural Selection: Example
If a trait is essential to an organism’s survival,
it will be conserved in the population
Generation 3
2
5
6
Sequence Alignment
Pairwise Sequence Alignment
Pairwise Sequence Alignment
AGTACGTACGT
ACGTACGTAAT
Pairwise Sequence Alignment
A-GTACGTACGT
ACGTACGTAA-T
Pairwise Sequence Alignment
A-GTACGTACGT
ACGTACGTAA-T
Pairwise Sequence Alignment
Pairwise Sequence Alignment: Scoring Function
Given an alignment, a gap penalty σ, and a scoring matrix M, let the
score of the alignment be defined as the sum of the scores of each
position of the alignment, where a position is scored σ if either sequence
has a gap, else M(c,c’) where c is the symbol at the position in one
sequence and c’ is the symbol at the position in the other sequence
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 0
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 1
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 0
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 1
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 2
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 3
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 4
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 5
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 6
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 7
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 6
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 5
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 6
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
A-GTACGTACGT
ACGTACGTAA-T
Score: 6
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
Pairwise Sequence Alignment: Scoring Function
We want to maximize this scoring function
A-GTACGTACGT
ACGTACGTAA-T
Score: 6
| A | C | G | T |
A | +1 | -1 | -1 | -1 |
C | -1 | +1 | -1 | -1 |
G | -1 | -1 | +1 | -1 |
T | -1 | -1 | -1 | +1 |
σ = -1
The Global Alignment Problem
Given two strings s and t, a gap penalty σ, and a scoring matrix M, return a maximum-scoring alignment of s and t
The Global Alignment Problem
Given two strings s and t, a gap penalty σ, and a scoring matrix M, return a maximum-scoring alignment of s and t
AGTACGTACGT
ACGTACGTAAT
A-GTACGTACGT
ACGTACGTAA-T
The Local Alignment Problem
Given two strings s and t, a gap penalty σ, and a scoring matrix M, return a maximum-scoring alignment of
a substring of s and a substring of t
The Local Alignment Problem
Given two strings s and t, a gap penalty σ, and a scoring matrix M, return a maximum-scoring alignment of
a substring of s and a substring of t
AGTACGTACGT
ACGTACGTAAT
GTACGTA
GTACGTA
The Multiple Sequence Alignment Problem
Given multiple strings, a gap penalty σ, and a scoring matrix M, return a maximum-scoring alignment of the strings
The Multiple Sequence Alignment Problem
Given multiple strings, a gap penalty σ, and a scoring matrix M, return a maximum-scoring alignment of the strings
Variant Calling
Variant Calling
Variant Calling
ACATACGTACGT
ACGTACGTACGT
ACGTACGTACGT
ACATACGTTCGT
ACGTACGTACGT
ACGTACGTACGT
ACATACGTACGT
ACGTACGTACGT
ACGTACGTTCGT
Variant Calling
ACAGCAGCAGCAGTT
ACAGCAGTT
ACAGTT
ACAGCAGCAGTT
SNV Calling: General Approach
SNV Calling: General Approach
SNV Calling: General Approach
ACTTACGT
GTACGTAC
TACGTACG
CTTACGTA
CGTACTTA
REF: ...ACGTACGTACGTACGTACGTACGT...
SNV Calling: General Approach
ACTTACGT
GTACGTAC
TACGTACG
CTTACGTA
CGTACTTA
REF: ...ACGTACGTACGTACGTACGTACGT...
50% G
50% T
G
T
SNV Calling: Challenges
SNV Calling: Challenges
SNV Calling: Challenges
Population Genetics
Population Genetics
Population Genetics
Population Genetics
Differential Expression Analysis
Differential Expression Analysis: RNA-Seq
Differential Expression Analysis: RNA-Seq
Differential Expression Analysis: RNA-Seq
Differential Expression Analysis: RNA-Seq
Differential Expression Analysis: RNA-Seq
Differential Expression Analysis
DNA
RNA
Reverse Transcription
Differential Expression Analysis
Differential Expression Analysis
Differential Expression Analysis
Gene | Sample 1 Count | Sample 2 Count |
A | ### | ### |
B | ### | ### |
C | ### | ### |
Differential Expression Analysis
Gene | Sample 1 FPKM | Sample 2 FPKM |
A | ### | ### |
B | ### | ### |
C | ### | ### |
Differential Expression Analysis
Gene | Sample 1 FPKM | Sample 2 FPKM | Log-2 Ratio | p |
A | ### | ### | ### | ### |
B | ### | ### | ### | ### |
C | ### | ### | ### | ### |
Genome Assembly
Genome Assembly
...ATACAGTGGAACACCATCTG...
Genome Assembly
ATACAG
CAGTGG
GGAACA
CACCAT
CCATCT
Genome Assembly
ATACAG
CAGTGG
GGAACA
CACCAT
CCATCT
...ATACAGTGGAACACCATCTG...
Genome Assembly
Phylogenetics
Phylogenetics
Phylogenetics
Present-Day Species
Phylogenetics
Ancestors (extinct)
Phylogenetics
Evolutionary Time
Models of Evolution
Models of Evolution
Models of Evolution
Models of Evolution
Models of Evolution
Phylogenetic Inference
Phylogenetic Inference
Phylogenetic Inference
Summary
Summary
Summary
Summary
Summary
Summary