Network-based integration of multiple networks
Nov 26th, 2024
BMI/CS 775 Computational Network Biology�Fall 2024
Anthony Gitter
Topics in this section
Why BIONIC?
Appeal of representation learning on graphs
Recurring theme: unique information captured in different omics assays
Image: TCGA, Gligorevic et al., Proteomics 2015
Have discussed this in terms of node information
Also true for edge information
Three types of biological networks
Images: Fout et al. NIPS 2017, Jung Choi 2013, van Leeuwen 2017
Protein-protein
Gene
co-expression
Genetic
BIONIC goal
How can we perform graph representation learning across diverse biological networks?
Improve node embeddings
BIONIC
Sample-sample
networks
Share information across disjoint networks
SNF
Bo Wang
Cluster samples
Gene-gene
networks
Node (gene) embeddings
Cluster genes
Predict gene function
BIONIC versus SNF coexpression
SNF
co-expression
BIONIC
co-expression
Genes
Genes
Gene similarity networks
Gene similarity matrices
Genetic interactions
Image: van Leeuwen 2017
Pairwise phenotype unexpected based on individual phenotypes
Restating the BIONIC goal
Input: multiple biological networks
Optional node labels
Output: informative node embeddings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Top Hat question
BIONIC algorithm overview
Image: Forster 2022
BIONIC algorithm overview
Image: Forster 2022
Multiple input networks
BIONIC algorithm overview
Image: Forster 2022
Pass each graph through multiple layers of a graph attention network
Special way to combine node embeddings
BIONIC algorithm overview
Image: Forster 2022
Reconstruct a single graph from the combined node embeddings
BIONIC algorithm overview
Image: Forster 2022
Combine graph reconstruction error with respect to each original graph
Learning the node embeddings
Image: Forster 2022
Learning the node embeddings
Learned scaling parameter
Graph-specific embedding
Mask value for node i in graph j
Graph reconstruction error
linear layer =>
Reconstruct a single graph
Loss function sums over each original graph
Binary mask vector of nodes in graph j
Reduce dimension
Extension to semi-supervised learning
Use node embeddings to predict labels
Combined loss
Predict labels for all genes
Sigmoid function
Weight matrix
Binary label mask
Loss function sums over nodes and graphs
True node label
BIONIC implementation details
Evaluating BIONIC
Predicting protein modules
Image: Forster 2022
Intrinsic: use node embeddings directly
Extrinsic: train SVM on node embeddings
Predicting LSM2-7 protein complex
Image: Forster 2022
Complex interactions missing from individual networks
Predicting LSM2-7 protein complex
Image: Forster 2022
Complex interactions missing from individual networks
Wait… really?
Semi-supervised learning
Image: Forster 2022
GeneMANIA: label propagation on association networks
BIONIC scalability
BIONIC scalability
BIONIC summary
Network-based data integration summary
Problem | Goal | Input | Algorithm | Output |
Prioritizing candidate disease genes | Rank candidates using global network relationships | PPI network, known disease genes, candidates | Random walk with restart or graph diffusion | Ranked list of candidates |
Identifying disease gene subnetworks | Find connected subnetworks with mutated genes that span many patients | PPI network, mutations per patient | Heat kernel diffusion, identify subnetworks, assess significance | Subnetworks, significance, patients with mutations in the subnetworks |
Relating multiple omic measurements of one process | Select edges connecting important nodes of multiple types | PPI network, edge costs, node scores | Prize-collecting Steiner forest | Selected edges and nodes |
Clustering samples using multiple omic measurements | Jointly use all data types to inform the clustering | Multiple types of omic data for each sample | Form similarity matrices, update iteratively across data types, spectral clustering | Consensus sample-sample similarities and sample clusters |
Combining complementary graphs | Learn a consensus node embedding from all graphs | Multiple biological networks, optional node labels | Multi-graph autoencoder | Node embeddings |