Graph clustering
Oct 21st 2025
These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Sushmita Roy and Anthony Gitter
Plan for this section
Goals for today
RECAP: A modular network
Module 1
Module 2
Module 3
M. E. J. Newman, Modularity and community structure in networks, PNAS 2006
RECAP: Studying modularity in biological systems
Given a graph, what are the modules
Problem definition of (topological) module detection
Common graph clustering algorithms
Please review Clustering slides at the end of this presentation if needed.
Goals for today
Community structure in a graph
Detecting community structure in networks, M. E. J. Newman
Communities
Intercommunity edges
Girvan-Newman algorithm
M. E. J. Newman and M. Girvan. Finding and evaluating community structure
Betweenness of an edge
Girvan-Newman algorithm
M. E. J. Newman and M. Girvan. Finding and evaluating community structure
Girvan-Newman algorithm as a hierarchical clustering algorithm
Graph vertices
where clustering is good enough
Zachary’s karate club study
Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.
Girvan Newman on Zachary’s karate club dataset
Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.
Node grouping based on betweenness
Modularity
Top Hat
Goals for today
Notation
A few linear algebra concepts
Refer to end of lecture notes to look at some examples
Unnormalized Graph Laplacian
Unnormalized Graph Laplacian example
1
2
3
4
Adjacency matrix(W)
0 | 1 | 1 | 0 |
1 | 0 | 1 | 0 |
1 | 1 | 0 | 1 |
0 | 0 | 1 | 0 |
2 | 0 | 0 | 0 |
0 | 2 | 0 | 0 |
0 | 0 | 3 | 0 |
0 | 0 | 0 | 1 |
2 | -1 | -1 | 0 |
-1 | 2 | -1 | 0 |
-1 | -1 | 3 | -1 |
0 | 0 | -1 | 1 |
Example graph
1
2
3
4
1
2
3
4
Degree matrix (D)
1
2
3
4
1
2
3
4
Laplacian (L=D-W)
1
2
3
4
1
2
3
4
Properties of the Laplacian
Checking the first property
Connected components
Connected component: A subgraph spanning a vertex subset where every vertex can be “reached” from another vertex
C
B
A
D
G
H
E
F
This graph has two connected components
Number of connected components and the multiplicity of λ=0
Number of connected components and L’s smallest eigen value
Consider a graph with k components
Adjacency matrix
Laplacian matrix
Consider a graph with k components (continued)
An example with k=2 connected components
W1
W2
For this matrix, we expect to have two eigen vectors each with eigen value =0
The corresponding Laplacian
L1
L2
L=D-W
First 10 eigen values
The first two eigen values are 0
Number of eigen value (i)
First 10 eigen vectors
First two eigen vectors are piece-wise constant
Normalized graph Laplacians
Graph Laplacians have a lot of applications
Spectral clustering
An overview of spectral clustering
Adjacency matrix
1
n
1
n
nodes
nodes
A
Laplacian matrix
L=D-A
1
n
1
n
1
n
1
k
First k eigenvectors
k clusters
n1
D: Diagonal degree matrix
…
n2
nk
Spectral clustering can be applied on non-graph data
Toy example of spectral clustering
Luxburg tutorial on spectral clustering
Toy example of spectral clustering
10 nearest neighbors
unnorm: L, norm: Lrw
Toy example of spectral clustering
Fully connected
Recall the Zachary karate club (ZKC) study
MATLAB code for spectral clustering
%Load the data
d=importdata('zachmat.mat’);
%Show the adjacency matrix
imagesc(d.zachary);
A=d.zachary;
D=diag(sum(A));
%The Laplacian
L=D-A;
[e,v]=eig(L);
%Kmeans clustering with k=5
kid=kmeans(e(:,1:5),5)
Adjacency matrix of Zachary Karate club study
First 20 eigen values of the ZKC graph
Only 1 eigen value=0
K=5 clusters
K=9 clusters
Only one connected component
First two eigen vectors of the ZKC graph
Cluster 1
Cluster 2
Clusters depicted on the graph
Reordered matrix post clustering
ZKC graph clustered into k=5 clusters
Goals for today
Louvain algorithm
Blondel, Vincent D., Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. “Fast Unfolding of Communities in Large Networks,” March. https://doi.org/10.1088/1742-5468/2008/10/P10008.
Modularity in Louvain
where
vertices
Louvain algorithm
Change in modularity
Illustration of the Louvain algorithm
Louvain clustering comparison
Louvain
Louvain is much faster than existing algorithms. Some algorithms don’t complete within 24 hrs on the larger datasets.
Modularity
Run time
Application of Louvain to a cell phone network
Application of Louvain to a cell phone network
261 top communities accounting for 75% of the network nodes (people)
Dutch
French
How well do the clusters correspond to social structure?
Communities of more than 10000 people have 85% people speaking the same language
Goals for today
How to assess the quality of clustering?
Silhouette index
d(i,j): Distance between i and j
Davies Bouldin index
A Cluster Separation Measure, IEEE 1979
K: # of clusters
ith cluster
jth cluster
n: denotes the dimension of the data.
DREAM community challenge for module identification
Choodbar et al., 2019 Nature Methods
Overview of the DREAM disease module identification challenge
Challenge organization
Description of the networks
Challenge description and evaluation
Methods used
Overview of results
Ranks of top 10 algorithms are shown
Top performing methods
Diffusion State Distance (DSD)
Probability of reaching nodes from vx
See Cao, M. et al. Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks. PLoS ONE 8, e76339 (2013) for the proof of the above.
Are methods complementary?
Type of network is the biggest determinant of different performance
Top performing methods don’t agree on the modules as much
Other takeaways from disease module identification
Take away points of graph clustering
References
�
Eigen vector example
1 | 2 | 0 |
2 | 1 | 0 |
0 | 0 | -3 |
1 |
1 |
1 |
1 | 2 | 0 |
2 | 1 | 0 |
0 | 0 | -3 |
*
1 |
1 |
1 |
1 |
1 |
0 |
A
u
v
A*u
1 |
1 |
0 |
1 | 2 | 0 |
2 | 1 | 0 |
0 | 0 | -3 |
*
A*v
Eigen vector example
1 | 2 | 0 |
2 | 1 | 0 |
0 | 0 | -3 |
1 |
1 |
1 |
1 | 2 | 0 |
2 | 1 | 0 |
0 | 0 | -3 |
*
=
3 |
3 |
-3 |
1 |
1 |
1 |
1 |
1 |
0 |
A
u
v
A*u
1 |
1 |
0 |
1 | 2 | 0 |
2 | 1 | 0 |
0 | 0 | -3 |
*
=
3 |
3 |
0 |
A*v
Eigen vector example
1 | 2 | 0 |
2 | 1 | 0 |
0 | 0 | -3 |
1 |
1 |
1 |
1 | 2 | 0 |
2 | 1 | 0 |
0 | 0 | -3 |
*
=
3 |
3 |
-3 |
1 |
1 |
1 |
1 |
1 |
0 |
A
u
v
A*u
1 |
1 |
0 |
1 | 2 | 0 |
2 | 1 | 0 |
0 | 0 | -3 |
*
A*v
Background on clustering
Clustering
Task definition: clustering objects
Flat clustering
Hierarchical clustering
Slides adapted from Prof. Mark Craven; BMI 576
Types of hierarchical clustering strategies
Hierarchical clustering
leaves represent objects to be clustered (e.g. genes or samples)
height of bar indicates
degree of distance
within cluster
distance scale
0
Slides from Prof. Mark Craven
Hierarchical clustering to find modules on graphs
N1(i): the set of immediate neighbors of i
aij: corresponding entry in the adjacency matrix of the graph
Yip and Horvath, BMC Bioinformatics 2007; WGCNA: an R package for weighted correlation network analysis. Peter Langfelder and Steve Horvath
Flat clustering from a hierarchical clustering
cutting here results
in 2 clusters
cutting here results
in 4 clusters
Slides from Prof. Mark Craven