JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 87

Graph clustering

Oct 21^st 2025

BMI/CS 775 Computational Network Biology�Fall 2025

Sushmita Roy

https://bmi775.sites.wisc.edu

These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Sushmita Roy and Anthony Gitter

2 of 87

Plan for this section

Topological properties of networks (Oct 16^th,)

Graph clustering (Oct 21^st)

Representation learning on graphs (Oct 23^rd, Oct 28^th)

Graph alignment and applications (Oct 30^th, Nov 4^th)

3 of 87

Goals for today

Modularity and graph clustering
Algorithms for clustering on graphs

Girvan Newman algorithm
Spectral clustering
Louvain clustering

Evaluation of graph clustering algorithms

4 of 87

RECAP: A modular network

Module 1

Module 2

Module 3

M. E. J. Newman, Modularity and community structure in networks, PNAS 2006

5 of 87

RECAP: Studying modularity in biological systems

Given a network is it modular?

Clustering coefficient
Q Modularity: measures the relative density of edges between and within the groups

Given a network what are the modules in the network?

The graph can be partitioned into modules using “Graph clustering” or “Graph partitioning” or “Community Detection”
Clusters, modules are also called “communities”

6 of 87

Given a graph, what are the modules

Given a graph find the modules

Modules are represented by densely connected subgraphs

The graph can be partitioned into modules using “Graph clustering” or “Graph partitioning” or “Community Detection”
Clusters, modules are also called “communities”

7 of 87

Problem definition of (topological) module detection

Given:

A graph
A measure of cluster quality

Partition the vertices into groups such that they form densely connected subgraphs (e.g. as measured by the cluster quality)

8 of 87

Common graph clustering algorithms

Hierarchical or flat clustering using a notion of similarity between nodes
Girvan-Newman algorithm
Hierarchical Agglomerative clustering
Spectral clustering
Stochastic block models
Markov clustering algorithm
Affinity propagation
Louvain clustering

Please review Clustering slides at the end of this presentation if needed.

9 of 87

Goals for today

Modularity and graph clustering
Algorithms for clustering on graphs

Girvan Newman algorithm
Spectral clustering
Louvain clustering

Evaluation of graph clustering algorithms

10 of 87

Community structure in a graph

A graph that has a grouping (community) structure is going to have few intercommunity edges.
Community structure can be revealed by removing such intercommunity edges

Detecting community structure in networks, M. E. J. Newman

Communities

Intercommunity edges

11 of 87

Girvan-Newman algorithm

General idea: “If two communities are joined by only a few inter-community edges, then all paths through the network from vertices in one community to vertices in the other must pass along one of those few edges.”
Community structure can be revealed by removing edges that with high betweenness
Algorithm is based on a divisive clustering idea

M. E. J. Newman and M. Girvan. Finding and evaluating community structure

12 of 87

Betweenness of an edge

Betweenness of an edge e is defined as the number of shortest paths that include e
Edges that lie between communities tend to have high betweenness

13 of 87

Girvan-Newman algorithm

Initialize

Compute betweenness for all edges

Repeat until convergence criteria

Remove the edge with the highest betweenness
Recompute betweenness of remaining edges

Convergence criteria can be

No more edges
Desired modularity

M. E. J. Newman and M. Girvan. Finding and evaluating community structure

14 of 87

Girvan-Newman algorithm as a hierarchical clustering algorithm

One can view this algorithm as a top-down (divisive) hierarchical clustering algorithm
The root of the dendrogram groups all nodes into one community
Each branch of the tree represents the order of splitting the network as edges are removed

Graph vertices

where clustering is good enough

15 of 87

Zachary’s karate club study

A well-known example of a social network with community structure
Dataset collected by Wayne Zachary over 2 years who observed social interactions among members of a karate club
Due to a dispute the club split into two factions
Can a graph clustering/module detection algorithm predict the factions?

Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.

16 of 87

Girvan Newman on Zachary’s karate club dataset

Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.

Node grouping based on betweenness

Modularity

17 of 87

Top Hat

18 of 87

Goals for today

Modularity and graph clustering
Algorithms for clustering on graphs

Girvan Newman algorithm
Spectral clustering
Louvain clustering

Evaluation of graph clustering algorithms

19 of 87

Notation

Graph G={V, E} where V is the vertex set and E is the edge set
D: Degree matrix of a graph
W: Adjacency matrix of a graph
L: Graph Laplacian
x’: denotes the transpose of x, where x is a vector
A^-1 :denotes inverse of matrix A

20 of 87

A few linear algebra concepts

For a nXn matrix A
Eigen vector of A

v, an n-dimensional (nX1) vector, is the eigen vector of A with eigen value λ, (a scalar) if

Positive semi-definite

A is said to be positive semi-definite if, for any n-dimensional vector x, the following holds

Refer to end of lecture notes to look at some examples

21 of 87

Unnormalized Graph Laplacian

For a given graph G={V, E}
The unnormalized graph Laplacian is a | V |X| V | matrix

22 of 87

Unnormalized Graph Laplacian example

Adjacency matrix(W)

0	1	1	0
1	0	1	0
1	1	0	1
0	0	1	0

2	0	0	0
0	2	0	0
0	0	3	0
0	0	0	1

2	-1	-1	0
-1	2	-1	0
-1	-1	3	-1
0	0	-1	1

Example graph

Degree matrix (D)

Laplacian (L=D-W)

23 of 87

Properties of the Laplacian

For every vector f in Rⁿ,

L is symmetric and positive semi-definite

The smallest eigen value of L is 0 and its corresponding eigen vector is all 1s

L has n non-negative eigen values

24 of 87

Checking the first property

25 of 87

Connected components

A subgraph where there is a path from one node to another
The number of connected components is inherently tied to the Laplacian matrix

Connected component: A subgraph spanning a vertex subset where every vertex can be “reached” from another vertex

This graph has two connected components

26 of 87

Number of connected components and the multiplicity of λ=0

Let G be an undirected graph with non-negative weights.
Then the multiplicity, k, of the eigenvalue 0 of L equals the number of connected components in the graph A₁ , . . . , A_k

27 of 87

Number of connected components and L’s smallest eigen value

To see why this is true, we use the property of an eigen vector, consider the case of one connected component

If f is an eigen vector of L, then Lf=λf
For eigen value 0, Lf=0 (vector of all zeros)

In addition, we know

If f is an eigen vector corresponding to eigen value =0, f’Lf must be 0
The only way f’Lf can be 0 is if f_i=f_jbecausew_ijis non-zero
This holds for all vertices connected by a path
If all vertices are connected, then f is a vector of constants.

28 of 87

Consider a graph with k components

Assume the rows/columns of W are ordered by their connectivity
W is block diagonal with k blocks
L is also block diagonal

Adjacency matrix

Laplacian matrix

29 of 87

Consider a graph with k components (continued)

Each L_i is a proper graph Laplacian for the subgraph of the i^th component.
Each L_i has one eigen value of 0 and the corresponding eigen vector is constant one vector
Eigen vectors of the full Laplacian is the union of the eigen vectors of individual blocks with the remaining entries set to 0.
Thus L must have k eigen values equal to 0.
Each eigen vector of L is constant non-zeros for the entries corresponding to each connected component

30 of 87

An example with k=2 connected components

W₁

W₂

For this matrix, we expect to have two eigen vectors each with eigen value =0

31 of 87

The corresponding Laplacian

L₁

L₂

L=D-W

32 of 87

First 10 eigen values

The first two eigen values are 0

Number of eigen value (i)

33 of 87

First 10 eigen vectors

First two eigen vectors are piece-wise constant

34 of 87

Normalized graph Laplacians

There are two normalized graph Laplacian definitions

L_sym

L_rw

35 of 87

Graph Laplacians have a lot of applications

Graph clustering
Regularization in an objective function to find a solution that obeys the graph structure
Diffusion on a graph
Graph Transformers

36 of 87

Spectral clustering

Based on the graph Laplacian L=D-W

D is the diagonal degree matrix
W is the adjacency matrix

Obtain the k eigen vectors associated with k smallest eigen values of L
Represent each node as the k-dimensional vector
Cluster nodes based on k-means clustering

37 of 87

An overview of spectral clustering

Adjacency matrix

nodes

Laplacian matrix

L=D-A

First k eigenvectors

k clusters

n₁

D: Diagonal degree matrix

…

n₂

n_k

Our consensus clustering approach is related to spectral clustering, which we have an over view of here. First we start with a set of N genes, and let us assume that we have an adjacency matrix that describes the similarity of these genes. (click) In classic spectral clustering the next step is to define a Laplacian matrix that represents the connectivity of these genes , which is often calculated with respect to a degree matrix D and the adjacency matrix A.(click) Spectral clustering is then performed by taking the eigenvectors of the Laplacian matrix L. Here the “k most significant eigenvectors are selected. With those selected eigenvectors the N genes can be partitioned into k clusters. (click) This final result ultimately represents the partitioned sets of genes or, modules. So far this may be a familiar approach to many in the audience. Let’s now take a look at how this is implemented in practice to obtain consensus clusters of chromatin mark data in our work.

38 of 87

Spectral clustering can be applied on non-graph data

Let {x₁.. x_n} be a set of data points
We can create a similarity graph or a distance graph using the pairwise similarity (s_ij) or distance (d_ij) with one of the following strategies
ε- neighborhood graph

Connect all vertices v_i and v_j such that d_ij<ε

k-nearest neighbor graph

Connect v_ito its k nearest neighbors
Make the graph symmetric

Fully connected graph

Use s_ijas similarity for all pairs

39 of 87

Toy example of spectral clustering

Let’s consider spectral clustering for a toy dataset:

{x₁.. x₂₀₀}: 200 points drawn from a mixture of four Gaussians

Use Gaussian similarity to create a graph

We will consider two variants of the graph: 10 nearest neighbors and fully connected weighted graph

Luxburg tutorial on spectral clustering

40 of 87

Toy example of spectral clustering

10 nearest neighbors

unnorm: L, norm: L_rw

41 of 87

Toy example of spectral clustering

Fully connected

42 of 87

Recall the Zachary karate club (ZKC) study

43 of 87

MATLAB code for spectral clustering

%Load the data

d=importdata('zachmat.mat’);

%Show the adjacency matrix

imagesc(d.zachary);

A=d.zachary;

D=diag(sum(A));

%The Laplacian

L=D-A;

[e,v]=eig(L);

%Kmeans clustering with k=5

kid=kmeans(e(:,1:5),5)

44 of 87

Adjacency matrix of Zachary Karate club study

45 of 87

First 20 eigen values of the ZKC graph

Only 1 eigen value=0

K=5 clusters

K=9 clusters

Only one connected component

46 of 87

First two eigen vectors of the ZKC graph

Cluster 1

Cluster 2

47 of 87

Clusters depicted on the graph

48 of 87

Reordered matrix post clustering

49 of 87

ZKC graph clustered into k=5 clusters

50 of 87

Goals for today

Modularity and graph clustering
Algorithms for clustering on graphs

Girvan Newman algorithm
Spectral clustering
Louvain clustering

Evaluation of graph clustering algorithms

51 of 87

Louvain algorithm

Blondel, Vincent D., Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. “Fast Unfolding of Communities in Large Networks,” March. https://doi.org/10.1088/1742-5468/2008/10/P10008.

52 of 87

Modularity in Louvain

where

vertices

53 of 87

Louvain algorithm

Change in modularity

54 of 87

Illustration of the Louvain algorithm

55 of 87

Louvain clustering comparison

Louvain

Louvain is much faster than existing algorithms. Some algorithms don’t complete within 24 hrs on the larger datasets.

Modularity

Run time

56 of 87

Application of Louvain to a cell phone network

2.6 million network nodes constructed from a Belgian mobile phone records
Edge weight is the number of times phone calls were made in a six month period

57 of 87

Application of Louvain to a cell phone network

Node colors denotes the proportion of people speaking the main language.
Communities of at least 100 customers are shown.
Zooming into the middle node reveals further community structure based on language.

261 top communities accounting for 75% of the network nodes (people)

Dutch

French

58 of 87

How well do the clusters correspond to social structure?

Communities of more than 10000 people have 85% people speaking the same language

59 of 87

Goals for today

Modularity and graph clustering
Algorithms for clustering on graphs

Girvan Newman algorithm
Spectral clustering
Louvain clustering

Evaluation of graph clustering algorithms

60 of 87

How to assess the quality of clustering?

Modularity

Silhouette Index

Measures the boundaries of clustering

Davis-Bouldin Index

Other domain-specific properties

61 of 87

Silhouette index

A measure to assess the sharpness of clustering
Ranges between -1 and 1, higher values are better
For each data point i, which belongs to cluster C_i we define s(i) as

Where,

d(i,j): Distance between i and j

62 of 87

Davies Bouldin index

Another measure to assess goodness of clustering

Where,

A Cluster Separation Measure, IEEE 1979

K: # of clusters

i^th cluster

j^th cluster

n: denotes the dimension of the data.

63 of 87

DREAM community challenge for module identification

A community challenge to assess algorithms for module identification across diverse molecular networks
Evaluated 75 different methods

42 single network methods and 33 multi-network methods.

Evaluation: how many modules are associated with Genome-wide association study (GWAS) traits.

A module is meaningful if it has a significant association with a GWAS trait.

Choodbar et al., 2019 Nature Methods

64 of 87

Overview of the DREAM disease module identification challenge

65 of 87

Challenge organization

Challenge was executed on Synapse
Submissions accepted over a 2 month period where submitters could use benchmark data to assess and improve their predictions
Final submissions were done on a separate GWAS dataset

66 of 87

Description of the networks

Six networks which were anonymized and given to challenge participants
Networks had different topological properties to have a heterogeneous benchmark

67 of 87

Challenge description and evaluation

Algorithms predicted non-overlapping modules of size 3-100 nodes
Assess modules based on GWAS trait and disease association

68 of 87

Methods used

42 different methods from the following categories

69 of 87

Overview of results

Ranks of top 10 algorithms are shown

70 of 87

Top performing methods

Come from different categories
The best performers used diffusion state distance (DSD) for each pair of vertices + spectral clustering

Convert DSD into a similarity by passing it through the Gaussian kernel
Apply spectral clustering

Runner up methods used

Modularity optimization
Random walk based on Markov Clustering

71 of 87

Diffusion State Distance (DSD)

Derived based on a random walk from a node

When t→∞

Probability of reaching nodes from v_x

See Cao, M. et al. Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks. PLoS ONE 8, e76339 (2013) for the proof of the above.

72 of 87

Are methods complementary?

Type of network is the biggest determinant of different performance

Top performing methods don’t agree on the modules as much

73 of 87

Other takeaways from disease module identification

Co-expression and Protein-protein interaction network based modules were most informative
Top methods covered different categories
Determining the right resolution can impact the results
Preprocessing, e.g. sparsification of networks was beneficial

The top method however did not need this

74 of 87

Take away points of graph clustering

Modularity is an important organizational property of complex networks
Community detection, graph clustering or graph partitioning are the same problem
We looked at different module finding algorithms

Girvan-Newman: remove edges with high betweenness
Louvain: optimize modularity in a two phase step
Spectral clustering: relaxed graph cut; flat clustering on Laplacian eigen vectors

Some algorithms provide a flat clustering, others also relate the clusters
Key params: modularity, resolution, number of clusters
How to evaluate the clusters?

75 of 87

References

von Luxburg, Ulrike. 2007. “A Tutorial on Spectral Clustering.” Statistics and Computing 17(4): 395–416. http://dx.doi.org/10.1007/s11222-007-9033-z.
The DREAM Module Identification Challenge Consortium et al. Assessment of network module identification across complex diseases. Nat Methods 16, 843–852 (2019).
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008;2008(10):P10008. doi:10.1088/1742-5468/2008/10/P10008
M. E. J. Newman and M. Girvan. “Finding and evaluating community structure in networks”, 2003, 10.1103/PhysRevE.69.026113
M. E. J. Newman, "Modularity and community structure in networks," Proceedings of the National Academy of Sciences, vol. 103, no. 23, pp. 8577-8582, Jun. 2006. [Online]. Available: http://dx.doi.org/10.1073/pnas.0601602103

�

76 of 87

Eigen vector example

Consider the matrix and vectors

1	2	0
2	1	0
0	0	-3

1
1
1

1	2	0
2	1	0
0	0	-3

1
1
1

1
1
0

A*u

1
1
0

1	2	0
2	1	0
0	0	-3

A*v

77 of 87

Eigen vector example

Consider the matrix and vectors

1	2	0
2	1	0
0	0	-3

1
1
1

1	2	0
2	1	0
0	0	-3

3
3
-3

1
1
1

1
1
0

A*u

1
1
0

1	2	0
2	1	0
0	0	-3

3
3
0

A*v

78 of 87

Eigen vector example

Consider the matrix and vectors

1	2	0
2	1	0
0	0	-3

1
1
1

1	2	0
2	1	0
0	0	-3

3
3
-3

1
1
1

1
1
0

A*u

1
1
0

1	2	0
2	1	0
0	0	-3

A*v

79 of 87

Background on clustering

80 of 87

Clustering

Types of clustering

Flat clustering

K-means
Gaussian mixture models

Hierarchical clustering

Clustering algorithms differ in the distance measure used to group objects together

81 of 87

Task definition: clustering objects

Given: attributes for a set of objects we wish to cluster
Do: organize objects into groups such that

Objects in the same cluster are highly similar to each other
Objects from different clusters have low similarity to each other

82 of 87

Flat clustering

Cluster objects into K clusters
K : number of clusters is a user defined argument
Two example algorithms

K-means
Gaussian mixture model-based clustering

83 of 87

Hierarchical clustering

Hierarchical clustering is a widely used clustering technique
Instead of the number of clusters, it requires us to specify how much dissimilarity we will tolerate between groups
The hierarchical clustering is represented by a tree structure called a dendrogram

Slides adapted from Prof. Mark Craven; BMI 576

84 of 87

Types of hierarchical clustering strategies

Agglomerative (bottom-up)

Start from the individual objects
Group objects or clusters of objects

Divisive (or top-down)

Start from all objects in a single cluster
Break down each cluster into smaller clusters

85 of 87

Hierarchical clustering

leaves represent objects to be clustered (e.g. genes or samples)

height of bar indicates

degree of distance

within cluster

distance scale

Slides from Prof. Mark Craven

86 of 87

Hierarchical clustering to find modules on graphs

What is a good measure of similarity to cluster nodes on a graph?
One approach is to use local topological overlap t_ij

N₁(i): the set of immediate neighbors of i

a_ij:corresponding entry in the adjacency matrix of the graph

Yip and Horvath, BMC Bioinformatics 2007; WGCNA: an R package for weighted correlation network analysis. Peter Langfelder and Steve Horvath

87 of 87

Flat clustering from a hierarchical clustering

We can always generate a flat clustering from a hierarchical clustering by “cutting” the tree at some distance threshold

cutting here results

in 2 clusters

cutting here results

in 4 clusters

Slides from Prof. Mark Craven