Network-based integration with few samples
Nov 19th, 2024
BMI/CS 775 Computational Network Biology�Fall 2024
Anthony Gitter
Includes slides created by Prof. Sushmita Roy
Topics in this section
Biological data is of many different types
Image credit: TCGA, Gligorevic et al., Proteomics 2015
Need for systematic approaches for data integration
Network-based approaches for integrating data
Subnetwork selection problem definition
Running example: host cell responses in KSHV and mock infections
RNA-seq
Proteomics
Phosphoproteomics
What happens after viral infection?
Variety of subnetwork selection methods
Steiner tree problem definition
Steiner tree observations
Steiner tree examples
A
B
C
E
Terminals
S={A, B, C}
A
B
C
E
D
G
F
Input
Solution
1
1
1
1
1
2
2
2
Prize-collecting Steiner tree objective function
Score for proteins in KSHV infection
Mock KSHV
Mock KSHV
High TF score
Low TF score
PPI network costs for edges
SLN1-YPD1:
Cost = 1 – confidence score
Visualizing the Steiner tree problem
Protein-protein interactions
KSHV change
No change
These slides derived from examples by Ernest Fraenkel
Visualizing the Steiner tree problem
Node prize�doesn’t justify the �edge costs
Steiner node
KSHV change
No data
Prize-collecting Steiner forest objective
3. Collect prizes while minimizing edge costs
Controls tradeoff between prizes and costs
Prizes of excluded vertices (nodes)
Costs of included edges
Minimize
Find forest (subnetwork)
Forest is a collection of disconnected trees
Top Hat question
Constructing a probabilistic approximation algorithm
msgsteiner algorithm Bailly-Bechet et al. PNAS 2011
How many possible subgraphs?
How can we perform inference in that state space?
Transform global tree constraint into local constraints
R
Add Root node (R) and Null node (*)
*
Transform global tree constraint into local constraints
R
Create two variables for every node i:
pai – parent node
di – depth from root
*
Translate omitted prize cost into edge cost
R
Valid tree
pai = *
di = ∞
ci,* = λ p(vi)
Node excluded from tree incurs cost equal to its prize
*
Probabilistic optimization
Boltzmann-Gibbs distribution
H is the “energy” of the state
Statistical mechanics: cavity method�Probabilistic graphical models: belief propagation
Iterative optimization with belief propagation
Node i receives messages from neighbors
Node i updates state of pai and di
J. Ortiz, T. Evans, A. Davison. A visual introduction to Gaussian Belief Propagation, 2021. https://gaussianbp.github.io/
Combining forests into pathways
=
+
…
+
Latent KSHV “pathway”
Zooming in on regions of interest
Peroxisome subnetwork
Peroxisome validation
Peroxisome counts increase upon infection
Knocking down peroxisomal enzymes increases cell death only in KSHV-infected cells
Integer linear programs (ILPs) for subnetwork selection
ILP for viral replication subnetwork
Chasman et al., 2014 PLOS Computational Biology
Key idea in ILPs
Variables in viral replication ILP
Chasman et al., 2014 PLOS Computational Biology
Set up for complex reasoning about directionality of paths and signs of effects (increase or decrease in viral replication)
Variables in viral replication ILP
Chasman et al., 2014 PLOS Computational Biology
Is path number 9 selected?
Is edge IV selected? Is it activating?
Is node C selected? Is it up during replication?
Is edge I selected? Activating? Directed?
Is edge III selected? Is it inhibiting?
Viral replication ILP objective
N nodes in the network
NH subset of nodes that are hits
score(n) diffusion kernel score from hits
yn binary variable, node present in any path
Viral replication ILP constraints
EX edges connecting to virus
xe edge selected
γ constant
P(e) all paths for edge
σp enumerated path
ae edge is activating
he edge is inhibiting
Information flow problem definition
Information flow between sink to source nodes
Source nodes
Sink (target) nodes
Intermediate nodes
Transmit flow
Information flow-based methods
Network-based interpretation and integration summary
Web and software resources
Information flow details
Information flow notation
Flow in a graph G
Capacity constraint
Conservation of flow
An example flow network
s
t
v3
v4
v1
v2
16
12
13
4
10
9
14
7
20
4
s
t
v3
v4
v1
v2
11(16)
12(12)
8(13)
1(4)
4(9)
11(14)
7(7)
15(20)
4(4)
Flow network G
A flow of 19 on G
Only positive flows are shown
(10)
Max-flow problem
Variation: Min cost flow
Min cost flow