Graph diffusion for network-based applications
Nov 6th, 2025
Original slides created by Prof. Sushmita Roy
These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Sushmita Roy and Anthony Gitter
Topics in this section
Goals for today
Gene prioritization
Computational tools for prioritizing candidate genes: boosting disease gene discovery Moreau and Tranchevent, Nature Review Genetics 2012
Why gene prioritization?
Linkage disequilibrium
Single nucleotide variants along the genome
Heatmap of correlations
Why gene prioritization?
Overview of approaches for gene prioritization
Network-based prioritization
Supervised network-based gene prioritization task definition
Supervised network-based gene prioritization
Slide credit: Debbie Chasman
Seed genes
(known genes)
Neighborhoods on graphs
Local network neighborhood-based prioritization
Known members
Novel members
Predictions of novel pathway components from MouseNET
Local network prioritization fails
Cowen et al., Nature Review Genetics 2017
Known disease gene
New disease gene we are trying to find
False positives
Local network prioritization fails
Cowen et al., Nature Review Genetics 2017
Known disease gene
New disease genes we are trying to find
False positives
Desired global network prioritization
Cowen et al., Nature Review Genetics 2017
Known disease gene
New disease genes we are trying to find
Successfully rank the other
disease genes above other
genes
Global network neighborhood-based prioritization
Goals for today
Motivation of GeneWanderer
S. Köhler, S. Bauer, D. Horn, and P. N. Robinson, "Walking the interactome for prioritization of candidate disease genes." American Journal of Human Genetics 2008
Overview of GeneWanderer
Motivation of using global distance
Known gene: x
Candidate gene: y
Global graph distances used in GeneWanderer
Random walk on graphs
Random walk on graphs
Transition matrix of a graph
0 | 1 | 0 | 1 | 0 |
1 | 0 | 1 | 0 | 0 |
0 | 1 | 0 | 1 | 1 |
1 | 0 | 1 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
a
b
c
d
e
a
b
c
d
e
0 | 0.5 | 0 | 0.5 | 0 |
0.5 | 0 | 1/3 | 0 | 0 |
0 | 0.5 | 0 | 0.5 | 1 |
0.5 | 0 | 1/3 | 0 | 0 |
0 | 0 | 1/3 | 0 | 0 |
a
b
c
d
e
a
b
c
d
e
A: Adjacency matrix
Column-normalized adjacency matrix
Network adapted from IsoRank paper
a
b
c
d
e
0 | 0.5 | 0 | 0.5 | 0 |
0.5 | 0 | 0.5 | 0 | 0 |
0 | 1/3 | 0 | 1/3 | 1/3 |
0.5 | 0 | 0.5 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
a
b
c
d
e
a
b
c
d
e
Row-normalized adjacency matrix
Transition matrix of a graph
0 | 1 | 0 | 1 | 0 |
1 | 0 | 1 | 0 | 0 |
0 | 1 | 0 | 1 | 1 |
1 | 0 | 1 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
a
b
c
d
e
a
b
c
d
e
0 | 0.5 | 0 | 0.5 | 0 |
0.5 | 0 | 1/3 | 0 | 0 |
0 | 0.5 | 0 | 0.5 | 1 |
0.5 | 0 | 1/3 | 0 | 0 |
0 | 0 | 1/3 | 0 | 0 |
a
b
c
d
e
a
b
c
d
e
A: Adjacency matrix
Column-normalized adjacency matrix
Network adapted from IsoRank paper
a
b
c
d
e
0 | 0.5 | 0 | 0.5 | 0 |
0.5 | 0 | 0.5 | 0 | 0 |
0 | 1/3 | 0 | 1/3 | 1/3 |
0.5 | 0 | 0.5 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
a
b
c
d
e
a
b
c
d
e
Row-normalized adjacency matrix
Random Walk with Restarts (RWR)
Prior probability of starting at a node
RWR continued
RWR example
0 | 1 | 0 | 1 | 0 |
1 | 0 | 1 | 0 | 0 |
0 | 1 | 0 | 1 | 1 |
1 | 0 | 1 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
a
b
c
d
e
a
b
c
d
e
0 | 0.5 | 0 | 0.5 | 0 |
0.5 | 0 | 1/3 | 0 | 0 |
0 | 0.5 | 0 | 0.5 | 1 |
0.5 | 0 | 1/3 | 0 | 0 |
0 | 0 | 1/3 | 0 | 0 |
a
b
c
d
e
a
b
c
d
e
A: Adjacency matrix
W: Column-normalized adjacency matrix
Network adapted from IsoRank paper
a
b
c
d
e
RWR example continued
a
b
c
d
e
Iteration 0
0.5
0.5
0
0
0
Iteration 1
a
b
c
d
e
0.4250
0.4250
0.0750
0.0750
0
Iteration 2
a
b
c
d
e
0.4250
0.4213
0.0750
0.0713
0.0075
Iteration 6
(convergence)
a
b
c
d
e
0.4239
0.4212
0.0761
0.0712
0.0076
RWR example continued
a
b
c
d
e
Iteration 0
0.5
0.5
0
0
0
Iteration 1
a
b
c
d
e
0.3750
0.3750
0.1250
0.1250
0
Iteration 2
a
b
c
d
e
0.3750
0.3646
0.1250
0.1146
0.0208
Iteration 8
(convergence)
a
b
c
d
e
0.3696
0.3641
0.1304
0.1141
0.0217
Effect of restart probability
r=0.5
Iteration 8
(convergence)
a
b
c
d
e
0.3696
0.3641
0.1304
0.1141
0.0217
Iteration 6
(convergence)
a
b
c
d
e
0.4239
0.4212
0.0761
0.0712
0.0076
r=0.7
Top Hat question
Global graph distances used in GeneWanderer
Kernel
Diffusion kernel
Kondor and Lafferty, “Diffusion Kernels on Graphs and Other Discrete Input Spaces”, ICML 2002
Diffusion kernel
Diffusion kernel
Gärtner et al., “On Graph Kernels: Hardness Results and Efficient Alternatives”, 2003
Kondor and Lafferty, “Diffusion Kernels on Graphs and Other Discrete Input Spaces”, ICML 2002
Diffusion kernel example
2 | -1 | 0 | -1 | 0 |
-1 | 2 | -1 | 0 | 0 |
0 | -1 | 3 | -1 | -1 |
-1 | 0 | -1 | 2 | 0 |
0 | 0 | -1 | 0 | 1 |
a
b
c
d
e
a
b
c
d
e
1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 1 |
a
b
c
d
e
a
b
c
d
e
L: Laplacian matrix
a
b
c
d
e
Diffusion kernel example
a
b
c
d
e
a
b
c
d
e
0.47 | 0.21 | 0.09 | 0.21 | 0.02 |
0.21 | 0.46 | 0.18 | 0.09 | 0.05 |
0.09 | 0.18 | 0.34 | 0.18 | 0.21 |
0.21 | 0.09 | 0.18 | 0.46 | 0.05 |
0.02 | 0.05 | 0.21 | 0.05 | 0.67 |
a
b
c
d
e
a
b
c
d
e
L: Laplacian matrix
a
b
c
d
e
2 | -1 | 0 | -1 | 0 |
-1 | 2 | -1 | 0 | 0 |
0 | -1 | 3 | -1 | -1 |
-1 | 0 | -1 | 2 | 0 |
0 | 0 | -1 | 0 | 1 |
Diffusion kernel example
a
b
c
d
e
a
b
c
d
e
0.32 | 0.24 | 0.15 | 0.24 | 0.06 |
0.24 | 0.3 | 0.19 | 0.17 | 0.11 |
0.15 | 0.19 | 0.23 | 0.19 | 0.24 |
0.24 | 0.17 | 0.19 | 0.3 | 0.11 |
0.06 | 0.11 | 0.24 | 0.11 | 0.49 |
a
b
c
d
e
a
b
c
d
e
L: Laplacian matrix
a
b
c
d
e
2 | -1 | 0 | -1 | 0 |
-1 | 2 | -1 | 0 | 0 |
0 | -1 | 3 | -1 | -1 |
-1 | 0 | -1 | 2 | 0 |
0 | 0 | -1 | 0 | 1 |
Diffusion kernel example
a
b
c
d
e
a
b
c
d
e
0.24 | 0.22 | 0.19 | 0.22 | 0.13 |
0.22 | 0.22 | 0.19 | 0.2 | 0.16 |
0.19 | 0.19 | 0.21 | 0.19 | 0.22 |
0.22 | 0.2 | 0.19 | 0.22 | 0.16 |
0.13 | 0.16 | 0.22 | 0.16 | 0.33 |
a
b
c
d
e
a
b
c
d
e
L: Laplacian matrix
a
b
c
d
e
2 | -1 | 0 | -1 | 0 |
-1 | 2 | -1 | 0 | 0 |
0 | -1 | 3 | -1 | -1 |
-1 | 0 | -1 | 2 | 0 |
0 | 0 | -1 | 0 | 1 |
Diffusion kernel example
a
b
c
d
e
a
b
c
d
e
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
a
b
c
d
e
a
b
c
d
e
L: Laplacian matrix
a
b
c
d
e
2 | -1 | 0 | -1 | 0 |
-1 | 2 | -1 | 0 | 0 |
0 | -1 | 3 | -1 | -1 |
-1 | 0 | -1 | 2 | 0 |
0 | 0 | -1 | 0 | 1 |
Diffusion kernel example
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
a
b
c
d
e
a
b
c
d
e
0.24 | 0.22 | 0.19 | 0.22 | 0.13 |
0.22 | 0.22 | 0.19 | 0.2 | 0.16 |
0.19 | 0.19 | 0.21 | 0.19 | 0.22 |
0.22 | 0.2 | 0.19 | 0.22 | 0.16 |
0.13 | 0.16 | 0.22 | 0.16 | 0.33 |
a
b
c
d
e
a
b
c
d
e
0.32 | 0.24 | 0.15 | 0.24 | 0.06 |
0.24 | 0.3 | 0.19 | 0.17 | 0.11 |
0.15 | 0.19 | 0.23 | 0.19 | 0.24 |
0.24 | 0.17 | 0.19 | 0.3 | 0.11 |
0.06 | 0.11 | 0.24 | 0.11 | 0.49 |
a
b
c
d
e
a
b
c
d
e
0.47 | 0.21 | 0.09 | 0.21 | 0.02 |
0.21 | 0.46 | 0.18 | 0.09 | 0.05 |
0.09 | 0.18 | 0.34 | 0.18 | 0.21 |
0.21 | 0.09 | 0.18 | 0.46 | 0.05 |
0.02 | 0.05 | 0.21 | 0.05 | 0.67 |
a
b
c
d
e
a
b
c
d
e
1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 1 |
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e
Competing methods
Dataset
OMIM
Evaluation strategy
Global graph-based measures have greater score enrichment compared to local measures
Mean enrichment scores of true genes
Diffusion kernel was not as useful for cancer diseases
A and B differ in how ties are handled
ROC analysis further supports enrichment analysis
RWR outperforms existing local network based methods on seven disease genes that do not have the literature bias
Shortest path fails but RWR successfully identifies disease genes
False positives from SP and DI
Dense network with multiple alternate paths is used by RWR to correctly rank genes
For bare lymphocyte syndrome type 1
RWR can identify genes based on indirect paths
Summary of GeneWanderer results
Concluding remarks
Other applications of random walks
Using random walks for denoising Hi-C data
Using random walks to smooth scRNA-seq data
van Dijk et al. Cell 2018
Euclidean distance
Kernel is adaptive
Each cell has at most k neighbors, to allow most of the Gaussian kernel to be covered.
Make Affinity matrix symmetric
Smoothing using MAGIC enhances scRNA-seq data signal