Special Topics
Announcements
Counting Sorts
Comparison-Based Sorts: Runtime
New Approach: Counting Sort
From Counting Sorts to Radix Sorts
MSD Radix Sort - L*(N + R) or N + R
LSD Radix Sort - L*(N + R)
Bucket Sort
Bucket Sort (N2 + K/R)
Bucket Sort (N2 + K/R)
Genome Informatics: Global Sequence Alignment
All credit to Dr. Alison Feder for the following slides!
Brief Terminology
What is Sequence Alignment?
Our Patient Scenario
Understanding Alignment
Reference:
To be aligned:
TGGAAGGGCTAATTCACTCCCAACGAAGACAAGA
CACCCCAA
Understanding Alignment
Some Possibilities:
TGGAAGGGCTAATTCACTCCCAACGAAGACAAGA
CACTCCCA
CAC-CCCAA
CTCCCAAC
To Align: CACCCCAA
How do we know what works best? We need a cost function!
Cost Matrix!
| A | C | G | T |
A | 10 | -5 | 0 | -5 |
C | -5 | 10 | -5 | 0 |
G | 0 | -5 | 10 | -5 |
T | -5 | 0 | -5 | 10 |
Gap penalty = -4
Based on the pairing you use, indicates if it is ‘good’ or ‘bad’.
Sequence 1
Sequence 2
Practice: Cost Matrix
| A | C | G | T |
A | 10 | -5 | 0 | -5 |
C | -5 | 10 | -5 | 0 |
G | 0 | -5 | 10 | -5 |
T | -5 | 0 | -5 | 10 |
Gap penalty = -4
C | A | C | T | C | C | C | A |
C | A | C | C | C | C | A | A |
10 | 10 | 10 | 0 | 10 | 10 | -5 | 10 |
C | A | C | T | C | C | C | A |
C | A | C | - | C | C | C | A |
10 | 10 | 10 | -4 | 10 | 10 | 10 | 10 |
By these rules, this is the better alignment
Issues with Scale
Duplicates!
C | A | C | T | C | C | C | A |
C | A | C | C | C | C | A | A |
10 | 10 | 10 | 0 | 10 | 10 | -5 | 10 |
C | A | C | T | C | C | C | A |
C | A | C | - | C | C | C | A |
10 | 10 | 10 | -4 | 10 | 10 | 10 | 10 |
Issues with Scale
Duplicates!
C | A | C | T | C | C | C | A |
C | A | C | C | C | C | A | A |
10 | 10 | 10 | 0 | 10 | 10 | -5 | 10 |
C | A | C | T | C | C | C | A |
C | A | C | - | C | C | C | A |
10 | 10 | 10 | -4 | 10 | 10 | 10 | 10 |
What tool can we use if we have a repeated subproblem that we reference to find the final solution to some problem?
Dynamic Programming
We can move in 3 different directions.
| | 0 | 1 |
| | | G |
0 | | 0 | |
1 | C | -4 | -8 |
-G
C-
| | 0 | 1 |
| | | G |
0 | | 0 | -4 |
1 | C | | -8 |
G-
-C
| A | C | G | T |
A | 10 | -5 | 0 | -5 |
C | -5 | 10 | -5 | 0 |
G | 0 | -5 | 10 | -5 |
T | -5 | 0 | -5 | 10 |
Gap penalty = -4
Dynamic Programming
We can move in 3 different directions.
| A | C | G | T |
A | 10 | -5 | 0 | -5 |
C | -5 | 10 | -5 | 0 |
G | 0 | -5 | 10 | -5 |
T | -5 | 0 | -5 | 10 |
Gap penalty = -4
| | 0 | 1 |
| | | G |
0 | | 0 | -4 |
1 | C | -4 | -5 |
G
C
For a given cell in our matrix, we choose the option with the lowest cost (largest gain). We then track that cost and the direction we took.
Dynamic Programming
| | 0 | 1 | 2 | 3 | 4 | 5 |
| | | G | A | A | T | C |
0 | | | | | | | |
1 | C | | | | | | |
2 | A | | | | | | |
3 | T | | | | | | |
4 | A | | | | | | |
5 | C | | | | | | |
| A | C | G | T |
A | 10 | -5 | 0 | -5 |
C | -5 | 10 | -5 | 0 |
G | 0 | -5 | 10 | -5 |
T | -5 | 0 | -5 | 10 |
Gap penalty = -4
We can move in 3 different directions.
Dynamic Programming
| | 0 | 1 | 2 | 3 | 4 | 5 |
| | | G | A | A | T | C |
0 | | 0 | -4 | -8 | -12 | -16 | -20 |
1 | C | -4 | -5 | -9 | -13 | -12 | -6 |
2 | A | -8 | -4 | 5 | 1 | -3 | -7 |
3 | T | -12 | -8 | 1 | 0 | 11 | 7 |
4 | A | -16 | -12 | 2 | 11 | 7 | 6 |
5 | C | -20 | -16 | -2 | 7 | 11 | 17 |
| A | C | G | T |
A | 10 | -5 | 0 | -5 |
C | -5 | 10 | -5 | 0 |
G | 0 | -5 | 10 | -5 |
T | -5 | 0 | -5 | 10 |
GAAT-C
-CATAC
GA-ATC
CATA-C
Dynamic Programming
Needleman-Wunsch Algorithm
n | Exhaustive enumeration | Dynamic programming |
5 | 1.7 * 103 | 7.5 * 101 |
10 | 8.1 * 106 | 3.0 * 102 |
20 | 2.6 * 1014 | 1.2 * 103 |
30 | 9.6 * 1021 | 2.7 * 103 |
40 | 3.8 * 1029 | 4.8 * 103 |
Local Alignment: Smith-Waterman
Thanks! See you all Wednesday! ~
Today’s Song Rec:
“Cities”
Throttle