1 of 87

Graph clustering

Oct 21st 2025

BMI/CS 775 Computational Network Biology�Fall 2025

Sushmita Roy

https://bmi775.sites.wisc.edu

These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Sushmita Roy and Anthony Gitter

2 of 87

Plan for this section

  • Topological properties of networks (Oct 16th, )

  • Graph clustering (Oct 21st)

  • Representation learning on graphs (Oct 23rd, Oct 28th)

  • Graph alignment and applications (Oct 30th, Nov 4th)

3 of 87

Goals for today

  • Modularity and graph clustering
  • Algorithms for clustering on graphs
    • Girvan Newman algorithm
    • Spectral clustering
    • Louvain clustering
  • Evaluation of graph clustering algorithms

4 of 87

RECAP: A modular network

Module 1

Module 2

Module 3

M. E. J. Newman, Modularity and community structure in networks, PNAS 2006

5 of 87

RECAP: Studying modularity in biological systems

  • Given a network is it modular?
    • Clustering coefficient
    • Q Modularity: measures the relative density of edges between and within the groups
  • Given a network what are the modules in the network?
    • The graph can be partitioned into modules using “Graph clustering” or “Graph partitioning” or “Community Detection”
    • Clusters, modules are also called “communities”

6 of 87

Given a graph, what are the modules

  • Given a graph find the modules
    • Modules are represented by densely connected subgraphs
  • The graph can be partitioned into modules using “Graph clustering” or “Graph partitioning” or “Community Detection”
  • Clusters, modules are also called “communities”

7 of 87

Problem definition of (topological) module detection

  • Given:
    • A graph
    • A measure of cluster quality
  • Do
    • Partition the vertices into groups such that they form densely connected subgraphs (e.g. as measured by the cluster quality)

8 of 87

Common graph clustering algorithms

  • Hierarchical or flat clustering using a notion of similarity between nodes
  • Girvan-Newman algorithm
  • Hierarchical Agglomerative clustering
  • Spectral clustering
  • Stochastic block models
  • Markov clustering algorithm
  • Affinity propagation
  • Louvain clustering

Please review Clustering slides at the end of this presentation if needed.

9 of 87

Goals for today

  • Modularity and graph clustering
  • Algorithms for clustering on graphs
    • Girvan Newman algorithm
    • Spectral clustering
    • Louvain clustering
  • Evaluation of graph clustering algorithms

10 of 87

Community structure in a graph

  • A graph that has a grouping (community) structure is going to have few intercommunity edges.
  • Community structure can be revealed by removing such intercommunity edges

Detecting community structure in networks, M. E. J. Newman

Communities

Intercommunity edges

11 of 87

Girvan-Newman algorithm

  • General idea: “If two communities are joined by only a few inter-community edges, then all paths through the network from vertices in one community to vertices in the other must pass along one of those few edges.”
  • Community structure can be revealed by removing edges that with high betweenness
  • Algorithm is based on a divisive clustering idea

M. E. J. Newman and M. Girvan. Finding and evaluating community structure

12 of 87

Betweenness of an edge

  • Betweenness of an edge e is defined as the number of shortest paths that include e
  • Edges that lie between communities tend to have high betweenness

13 of 87

Girvan-Newman algorithm

  • Initialize
    • Compute betweenness for all edges
  • Repeat until convergence criteria
    1. Remove the edge with the highest betweenness
    2. Recompute betweenness of remaining edges

  • Convergence criteria can be
    • No more edges
    • Desired modularity

M. E. J. Newman and M. Girvan. Finding and evaluating community structure

14 of 87

Girvan-Newman algorithm as a hierarchical clustering algorithm

  • One can view this algorithm as a top-down (divisive) hierarchical clustering algorithm
  • The root of the dendrogram groups all nodes into one community
  • Each branch of the tree represents the order of splitting the network as edges are removed

Graph vertices

where clustering is good enough

15 of 87

Zachary’s karate club study

  • A well-known example of a social network with community structure
  • Dataset collected by Wayne Zachary over 2 years who observed social interactions among members of a karate club
  • Due to a dispute the club split into two factions
  • Can a graph clustering/module detection algorithm predict the factions?

Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.

16 of 87

Girvan Newman on Zachary’s karate club dataset

Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.

Node grouping based on betweenness

Modularity

17 of 87

Top Hat

18 of 87

Goals for today

  • Modularity and graph clustering
  • Algorithms for clustering on graphs
    • Girvan Newman algorithm
    • Spectral clustering
    • Louvain clustering
  • Evaluation of graph clustering algorithms

19 of 87

Notation

  • Graph G={V, E} where V is the vertex set and E is the edge set
  • D: Degree matrix of a graph
  • W: Adjacency matrix of a graph
  • L: Graph Laplacian
  • x: denotes the transpose of x, where x is a vector
  • A-1 :denotes inverse of matrix A

20 of 87

A few linear algebra concepts

  • For a nXn matrix A
  • Eigen vector of A
    • v, an n-dimensional (nX1) vector, is the eigen vector of A with eigen value λ, (a scalar) if

  • Positive semi-definite
    • A is said to be positive semi-definite if, for any n-dimensional vector x, the following holds

Refer to end of lecture notes to look at some examples

21 of 87

Unnormalized Graph Laplacian

  • For a given graph G={V, E}
  • The unnormalized graph Laplacian is a | V |X| V | matrix

22 of 87

Unnormalized Graph Laplacian example

1

2

3

4

Adjacency matrix(W)

0

1

1

0

1

0

1

0

1

1

0

1

0

0

1

0

2

0

0

0

0

2

0

0

0

0

3

0

0

0

0

1

2

-1

-1

0

-1

2

-1

0

-1

-1

3

-1

0

0

-1

1

Example graph

1

2

3

4

1

2

3

4

Degree matrix (D)

1

2

3

4

1

2

3

4

Laplacian (L=D-W)

1

2

3

4

1

2

3

4

23 of 87

Properties of the Laplacian

  • For every vector f in Rn,

  • L is symmetric and positive semi-definite

  • The smallest eigen value of L is 0 and its corresponding eigen vector is all 1s

  • L has n non-negative eigen values

24 of 87

Checking the first property

25 of 87

Connected components

  • A subgraph where there is a path from one node to another
  • The number of connected components is inherently tied to the Laplacian matrix

Connected component: A subgraph spanning a vertex subset where every vertex can be “reached” from another vertex

C

B

A

D

G

H

E

F

This graph has two connected components

26 of 87

Number of connected components and the multiplicity of λ=0

  • Let G be an undirected graph with non-negative weights.
  • Then the multiplicity, k, of the eigenvalue 0 of L equals the number of connected components in the graph A1 , . . . , Ak

27 of 87

Number of connected components and L’s smallest eigen value

  • To see why this is true, we use the property of an eigen vector, consider the case of one connected component
    • If f is an eigen vector of L, then Lf=λf
    • For eigen value 0, Lf=0 (vector of all zeros)
  • In addition, we know

  • If f is an eigen vector corresponding to eigen value =0, f’Lf must be 0
  • The only way f’Lf can be 0 is if fi=fj because wij is non-zero
  • This holds for all vertices connected by a path
  • If all vertices are connected, then f is a vector of constants.

28 of 87

Consider a graph with k components

  • Assume the rows/columns of W are ordered by their connectivity
  • W is block diagonal with k blocks
  • L is also block diagonal

Adjacency matrix

Laplacian matrix

29 of 87

Consider a graph with k components (continued)

  • Each Li is a proper graph Laplacian for the subgraph of the ith component.
  • Each Li has one eigen value of 0 and the corresponding eigen vector is constant one vector
  • Eigen vectors of the full Laplacian is the union of the eigen vectors of individual blocks with the remaining entries set to 0.
  • Thus L must have k eigen values equal to 0.
  • Each eigen vector of L is constant non-zeros for the entries corresponding to each connected component

30 of 87

An example with k=2 connected components

W1

W2

For this matrix, we expect to have two eigen vectors each with eigen value =0

31 of 87

The corresponding Laplacian

L1

L2

L=D-W

32 of 87

First 10 eigen values

The first two eigen values are 0

Number of eigen value (i)

 

33 of 87

First 10 eigen vectors

First two eigen vectors are piece-wise constant

34 of 87

Normalized graph Laplacians

  • There are two normalized graph Laplacian definitions

  • Lsym

  • Lrw

35 of 87

Graph Laplacians have a lot of applications

  • Graph clustering
  • Regularization in an objective function to find a solution that obeys the graph structure
  • Diffusion on a graph
  • Graph Transformers

36 of 87

Spectral clustering

  • Based on the graph Laplacian L=D-W
    • D is the diagonal degree matrix
    • W is the adjacency matrix
  • Obtain the k eigen vectors associated with k smallest eigen values of L
  • Represent each node as the k-dimensional vector
  • Cluster nodes based on k-means clustering

37 of 87

An overview of spectral clustering

Adjacency matrix

1

n

1

n

nodes

nodes

A

Laplacian matrix

L=D-A

1

n

1

n

1

n

1

k

First k eigenvectors

k clusters

n1

D: Diagonal degree matrix

n2

nk

38 of 87

Spectral clustering can be applied on non-graph data

  • Let {x1.. xn} be a set of data points
  • We can create a similarity graph or a distance graph using the pairwise similarity (sij) or distance (dij) with one of the following strategies
  • ε- neighborhood graph
    • Connect all vertices vi and vj such that dij
  • k-nearest neighbor graph
    • Connect vi to its k nearest neighbors
    • Make the graph symmetric
  • Fully connected graph
    • Use sij as similarity for all pairs

39 of 87

Toy example of spectral clustering

  • Let’s consider spectral clustering for a toy dataset:
    • {x1.. x200}: 200 points drawn from a mixture of four Gaussians

    • Use Gaussian similarity to create a graph

    • We will consider two variants of the graph: 10 nearest neighbors and fully connected weighted graph

Luxburg tutorial on spectral clustering

40 of 87

Toy example of spectral clustering

10 nearest neighbors

unnorm: L, norm: Lrw

41 of 87

Toy example of spectral clustering

Fully connected

42 of 87

Recall the Zachary karate club (ZKC) study

43 of 87

MATLAB code for spectral clustering

%Load the data

d=importdata('zachmat.mat’);

%Show the adjacency matrix

imagesc(d.zachary);

A=d.zachary;

D=diag(sum(A));

%The Laplacian

L=D-A;

[e,v]=eig(L);

%Kmeans clustering with k=5

kid=kmeans(e(:,1:5),5)

44 of 87

Adjacency matrix of Zachary Karate club study

45 of 87

First 20 eigen values of the ZKC graph

Only 1 eigen value=0

K=5 clusters

K=9 clusters

Only one connected component

46 of 87

First two eigen vectors of the ZKC graph

Cluster 1

Cluster 2

47 of 87

Clusters depicted on the graph

48 of 87

Reordered matrix post clustering

49 of 87

ZKC graph clustered into k=5 clusters

50 of 87

Goals for today

  • Modularity and graph clustering
  • Algorithms for clustering on graphs
    • Girvan Newman algorithm
    • Spectral clustering
    • Louvain clustering
  • Evaluation of graph clustering algorithms

51 of 87

Louvain algorithm

  •  

Blondel, Vincent D., Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. “Fast Unfolding of Communities in Large Networks,” March. https://doi.org/10.1088/1742-5468/2008/10/P10008.

52 of 87

Modularity in Louvain

where

vertices

53 of 87

Louvain algorithm

  •  

Change in modularity

54 of 87

Illustration of the Louvain algorithm

55 of 87

Louvain clustering comparison

Louvain

Louvain is much faster than existing algorithms. Some algorithms don’t complete within 24 hrs on the larger datasets.

Modularity

Run time

56 of 87

Application of Louvain to a cell phone network

  • 2.6 million network nodes constructed from a Belgian mobile phone records
  • Edge weight is the number of times phone calls were made in a six month period

57 of 87

Application of Louvain to a cell phone network

  • Node colors denotes the proportion of people speaking the main language.
  • Communities of at least 100 customers are shown.
  • Zooming into the middle node reveals further community structure based on language.

261 top communities accounting for 75% of the network nodes (people)

Dutch

French

58 of 87

How well do the clusters correspond to social structure?

Communities of more than 10000 people have 85% people speaking the same language

59 of 87

Goals for today

  • Modularity and graph clustering
  • Algorithms for clustering on graphs
    • Girvan Newman algorithm
    • Spectral clustering
    • Louvain clustering
  • Evaluation of graph clustering algorithms

60 of 87

How to assess the quality of clustering?

  • Modularity

  • Silhouette Index
    • Measures the boundaries of clustering

  • Davis-Bouldin Index

  • Other domain-specific properties

61 of 87

Silhouette index

  • A measure to assess the sharpness of clustering
  • Ranges between -1 and 1, higher values are better
  • For each data point i, which belongs to cluster Ci we define s(i) as

  • Where,

d(i,j): Distance between i and j

62 of 87

Davies Bouldin index

  • Another measure to assess goodness of clustering

  • Where,

A Cluster Separation Measure, IEEE 1979

K: # of clusters

ith cluster

jth cluster

n: denotes the dimension of the data.

63 of 87

DREAM community challenge for module identification

  • A community challenge to assess algorithms for module identification across diverse molecular networks
  • Evaluated 75 different methods
    • 42 single network methods and 33 multi-network methods.
  • Evaluation: how many modules are associated with Genome-wide association study (GWAS) traits.
    • A module is meaningful if it has a significant association with a GWAS trait.

Choodbar et al., 2019 Nature Methods

64 of 87

Overview of the DREAM disease module identification challenge

65 of 87

Challenge organization

  • Challenge was executed on Synapse
  • Submissions accepted over a 2 month period where submitters could use benchmark data to assess and improve their predictions
  • Final submissions were done on a separate GWAS dataset

66 of 87

Description of the networks

  • Six networks which were anonymized and given to challenge participants
  • Networks had different topological properties to have a heterogeneous benchmark

67 of 87

Challenge description and evaluation

  • Algorithms predicted non-overlapping modules of size 3-100 nodes
  • Assess modules based on GWAS trait and disease association

68 of 87

Methods used

  • 42 different methods from the following categories

69 of 87

Overview of results

Ranks of top 10 algorithms are shown

70 of 87

Top performing methods

  • Come from different categories
  • The best performers used diffusion state distance (DSD) for each pair of vertices + spectral clustering
    • Convert DSD into a similarity by passing it through the Gaussian kernel
    • Apply spectral clustering
  • Runner up methods used
    1. Modularity optimization
    2. Random walk based on Markov Clustering

71 of 87

Diffusion State Distance (DSD)

  • Derived based on a random walk from a node

  • When t→∞

Probability of reaching nodes from vx

See Cao, M. et al. Going the Distance for Protein Function Prediction: A New Distance Metric for Protein Interaction Networks. PLoS ONE 8, e76339 (2013) for the proof of the above.

72 of 87

Are methods complementary?

Type of network is the biggest determinant of different performance

Top performing methods don’t agree on the modules as much

73 of 87

Other takeaways from disease module identification

  • Co-expression and Protein-protein interaction network based modules were most informative
  • Top methods covered different categories
  • Determining the right resolution can impact the results
  • Preprocessing, e.g. sparsification of networks was beneficial
    • The top method however did not need this

74 of 87

Take away points of graph clustering

  • Modularity is an important organizational property of complex networks
  • Community detection, graph clustering or graph partitioning are the same problem
  • We looked at different module finding algorithms
    • Girvan-Newman: remove edges with high betweenness
    • Louvain: optimize modularity in a two phase step
    • Spectral clustering: relaxed graph cut; flat clustering on Laplacian eigen vectors
  • Some algorithms provide a flat clustering, others also relate the clusters
  • Key params: modularity, resolution, number of clusters
  • How to evaluate the clusters?

75 of 87

References

  • von Luxburg, Ulrike. 2007. “A Tutorial on Spectral Clustering.” Statistics and Computing 17(4): 395–416. http://dx.doi.org/10.1007/s11222-007-9033-z.
  • The DREAM Module Identification Challenge Consortium et al. Assessment of network module identification across complex diseases. Nat Methods 16, 843–852 (2019).
  • Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008;2008(10):P10008. doi:10.1088/1742-5468/2008/10/P10008
  • M. E. J. Newman and M. Girvan. “Finding and evaluating community structure in networks”, 2003, 10.1103/PhysRevE.69.026113
  • M. E. J. Newman, "Modularity and community structure in networks," Proceedings of the National Academy of Sciences, vol. 103, no. 23, pp. 8577-8582, Jun. 2006. [Online]. Available: http://dx.doi.org/10.1073/pnas.0601602103

76 of 87

Eigen vector example

  • Consider the matrix and vectors

1

2

0

2

1

0

0

0

-3

1

1

1

1

2

0

2

1

0

0

0

-3

*

1

1

1

1

1

0

A

u

v

A*u

1

1

0

1

2

0

2

1

0

0

0

-3

*

A*v

77 of 87

Eigen vector example

  • Consider the matrix and vectors

1

2

0

2

1

0

0

0

-3

1

1

1

1

2

0

2

1

0

0

0

-3

*

=

3

3

-3

1

1

1

1

1

0

A

u

v

A*u

1

1

0

1

2

0

2

1

0

0

0

-3

*

=

3

3

0

A*v

78 of 87

Eigen vector example

  • Consider the matrix and vectors

1

2

0

2

1

0

0

0

-3

1

1

1

1

2

0

2

1

0

0

0

-3

*

=

3

3

-3

1

1

1

1

1

0

A

u

v

A*u

1

1

0

1

2

0

2

1

0

0

0

-3

*

A*v

79 of 87

Background on clustering

80 of 87

Clustering

  • Types of clustering
    • Flat clustering
      • K-means
      • Gaussian mixture models
    • Hierarchical clustering
  • Clustering algorithms differ in the distance measure used to group objects together

81 of 87

Task definition: clustering objects

  • Given: attributes for a set of objects we wish to cluster
  • Do: organize objects into groups such that
    • Objects in the same cluster are highly similar to each other
    • Objects from different clusters have low similarity to each other

82 of 87

Flat clustering

  • Cluster objects into K clusters
  • K : number of clusters is a user defined argument
  • Two example algorithms
    • K-means
    • Gaussian mixture model-based clustering

83 of 87

Hierarchical clustering

  • Hierarchical clustering is a widely used clustering technique
  • Instead of the number of clusters, it requires us to specify how much dissimilarity we will tolerate between groups
  • The hierarchical clustering is represented by a tree structure called a dendrogram

Slides adapted from Prof. Mark Craven; BMI 576

84 of 87

Types of hierarchical clustering strategies

  • Agglomerative (bottom-up)
    • Start from the individual objects
    • Group objects or clusters of objects
  • Divisive (or top-down)
    • Start from all objects in a single cluster
    • Break down each cluster into smaller clusters

85 of 87

Hierarchical clustering

leaves represent objects to be clustered (e.g. genes or samples)

height of bar indicates

degree of distance

within cluster

distance scale

0

Slides from Prof. Mark Craven

86 of 87

Hierarchical clustering to find modules on graphs

  • What is a good measure of similarity to cluster nodes on a graph?
  • One approach is to use local topological overlap tij

N1(i): the set of immediate neighbors of i

aij: corresponding entry in the adjacency matrix of the graph

Yip and Horvath, BMC Bioinformatics 2007; WGCNA: an R package for weighted correlation network analysis. Peter Langfelder and Steve Horvath

87 of 87

Flat clustering from a hierarchical clustering

  • We can always generate a flat clustering from a hierarchical clustering by “cutting” the tree at some distance threshold

cutting here results

in 2 clusters

cutting here results

in 4 clusters

Slides from Prof. Mark Craven