Comparison and alignment of networks
Oct 31st, Nov 5th 2024
BMI/CS 775 Computational Network Biology�Fall 2024
Sushmita Roy
Plan for the semester
2
When | What |
Week 2-Week 4 | Representing and learning networks from data |
Week 5-Week 7 | Supervised deep learning in network biology |
Week 8-Week 9 | Graph topology and modules |
Week 9-Week 10 | Graph alignment and comparison |
Week 11-Week 13 | Network-based integration and interpretation |
Week 14-Week 15 | Project presentations |
Plan for this section
Goals for today
Broad definition of graph alignment
Applications of network alignment
Alignment of scRNA-seq datasets
Alignment of molecular networks
Distinguishing the two alignment tasks
Goals for today
How are these organisms related?
Toh et al, Nature, 2011
Zoonomia project
A phylogenetic tree for ~200 mammals
Organisms can be compared at multiple levels
Some terminology
How do sequences change between organisms?
T H I S S E Q U E N C E
T H A T S E Q U E N C E
Sequence 2
Sequence 1
T H I S I S A S E Q U E N C E
T H I S _ _ _S E Q U E N C E
Sequence 2
Sequence 1
mismatch
gap
_ _ _ _S E Q U E N C E
T H I S S E Q U E N C E
Sequence 2
Sequence 1
gap
Alignment problems
Sequence Alignment Examples
T H I S S E Q U E N C E
T H A T S E Q U E N C E
T H + + S E Q U E N C E
Sequence 2
Sequence 1
T H I S - - - S E Q U E N C E
T H A T I S A S E Q U E N C E
T H + + S E Q U E N C E
Sequence 2
Sequence 1
Gaps and mismatches in network alignment
Network alignment example
The network alignment problem of finding conserved paths is equivalent to finding a high scoring path in a new network called the “network alignment” graph.
Each vertex in the alignment graph is represented by a pair of vertices and a corresponding numeric score (BLAST E-value) that specifies the sequence similarity
Each edge corresponds to to “interactions” in the original graph
Why is network alignment across species important?
What makes network alignment difficult?
Different network alignment problems
Local network alignment: Find locally similar subnetworks
Network query: Find instances of a small subnetwork in a larger network
Global network alignment: Align all nodes in one network to all nodes in the second network
Nir Atias and Roded Sharan, May 2012, ACM Communications
Different types of network alignment problems
Problem type | Description | Methods |
Local alignment | Align small parts of the network | PathBLAST, LocalAli |
Global alignment | Align the entire network | FUSE, IsoRank |
Pairwise alignment | Align two networks | PathBLAST |
Multiple network alignment | Align more than two networks | FUSE, IsoRank, LocalAli |
NetworkQuery | Search for a small network in a larger network | NetGrep, QNet, QPath |
Adapted from Nir Atias and Roded Sharan, May 2012, ACM Communications
Goals for today
Matrix factorization
Stein-O’Brien et al, Trends in Genetics 2018
R
W
H
Many different variants of MF
Non-negative matrix factorization (NMF)
25
Minimize
Lee and Seung Adv. Neur. In. 2001
•
Slide credit Erika Da-Inn Lee
Non-negative matrix factorization (NMF)
26
5 | | | 5 | | | |
4 | | 4 | 4 | | 5 | 4 |
4 | | 4 | 4 | | 3 | 4 |
3 | 1 | 4 | 3 | | 2 | 4 |
| 4 | 2 | | 3 | | 1 |
| 4 | | | 4 | 5 | |
| 3 | | | | 4 | |
| 2 | | | 3 | 2 | |
X = ℝn×m
Slide credit Erika Da-Inn Lee
Non-negative matrix factorization (NMF)
27
5 | | | 5 | | | |
4 | | 4 | 4 | | 5 | 4 |
4 | | 4 | 4 | | 3 | 4 |
3 | 1 | 4 | 3 | | 2 | 4 |
| 4 | 2 | | 3 | | 1 |
| 4 | | | 4 | 5 | |
| 3 | | | | 4 | |
| 2 | | | 3 | 2 | |
X = ℝn×m
| |
| |
| |
| |
| |
| |
| |
| |
| | | | | | |
| | | | | | |
W = ℝn×k
H = ℝk×m
k << n, m
Slide credit Erika Da-Inn Lee
NMF can be used for bi-clustering
| |
| |
| |
| |
| |
| |
| |
| |
| | | | | | |
| | | | | | |
H = ℝk×m
W = ℝn×k
Row clusters
Column clusters
Incorporating prior knowledge to constrain the factorization
29
5 | | | 5 | | | |
4 | | 4 | 4 | | 5 | 4 |
4 | | 4 | 4 | | 3 | 4 |
3 | 1 | 4 | 3 | | 2 | 4 |
| 4 | 2 | | 3 | | 1 |
| 4 | | | 4 | | |
| 3 | | | | | |
| 2 | | | 3 | 2 | |
| | | | | | |
| | | | | | |
Slide credit Erika Da-Inn Lee
Graph regularization to incorporate domain prior knowledge
30
Minimize
3 | -1 | -1 | -1 | -1 | 0 | 0 |
-1 | 4 | -1 | -1 | -1 | 0 | 0 |
-1 | -1 | 3 | -1 | -1 | 0 | 0 |
-1 | -1 | -1 | 4 | -1 | 0 | 0 |
-1 | -1 | -1 | -1 | 4 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 1 | -1 |
0 | 0 | 0 | 0 | 0 | -1 | 1 |
Slide credit Erika Da-Inn Lee
Non-negative matrix tri-factorization (NMTF)
Minimize
•
•
k1
k2
NMTF for biclustering
Cluster indicator matrix
Objects
Clusters
Example for 5 objects in 2 clusters
NMTF with constraints
Semi-Supervised Clustering via Matrix Factorization, Wang et al, 2008
Two types of objects: people and movies
Movies can be grouped based on actors, characters, titles
People can be grouped based on their hobbies and jobs
R12
X1
X2
Blue: Must link
Red: Cannot link
NMTF for constrained bi-clustering
Trade-off between maintaining intra-type constraints and estimating Rij
Goals for today
Motivation of FUSE
FUSE multiple network alignment
FUSE Notation
Overview of FUSE
Fuse step
NMTF for Global network alignment
NMTF for k entity types
This objective can be solved using an iterative multiplicative update algorithm from Wang 2008
Pictorial illustration for five species
Minimizing the objective NMTF
Overview of FUSE
Global network alignment step
Create a k-partite weighted graph
A 3-partite weighted graph
Species 1
Species 2
Species 3
a
b
c
α
β
γ
δ
W
X
Y
Z
Algorithm to find the best matching
Heuristic algorithm to find a maximum k-partite matching
K-partite graph
Graph merge step
Bi-partite matching example
Species i
a
b
α
β
γ
δ
c
Species j
Species i
a
b
α
β
γ
δ
Species j
Find a (maximal) matching (Fij)
c
Graph merge from matching
a
b
α
β
γ
δ
c
Merged graph of species 1, 2
The merged nodes inherit the edges of the constituent nodes
New graph with all species
a
b
c
α
β
γ
δ
W
X
Y
Z
a
b
α
β
γ
δ
c
W
X
Y
Z
Note, this is not a matching
Results
Statistics of PPI networks used
Species
NMTF induces new and reconstructs existing associations between proteins
Do the new similarities make sense?
Compute the cumulative number of associations between annotated proteins and the percentage of them sharing GO term (Biological Process (BP) and Molecular Function (MF) annotations separately).
Comparison against other algorithms
Did not finish in time
FUSE produces the largest number of good clusters
Fraction of blue is highest for clusters and proteins for FUSE.
Good: inclusion of as many species as possible
FUSE produces functionally consistent clusters
A cluster is said to be functionally consistent, if all its annotated proteins have at least one GO term in common.
References