Community Detection in Networks
January 15th, 2026
CS60078
An Example: Communities in Belgium
Communities in Social Networks
Zachary’s Karate Club
Defining Communities
Communities are locally dense connected subgraphs in a network. This expectation relies on two distinct hypotheses:
Connectedness Hypothesis
Each community corresponds to a connected subgraph, like the subgraphs formed by the orange, green or the purple nodes.
Density Hypothesis
Nodes in a community are more likely to connect to other members of the same community than to nodes in other communities.
Defining Communities
Maximal Cliques
Any drawbacks of this notion?
Formally
Strong vs Weak Community
Strong Community
C is a strong community if each node within C has more links within the community than with the rest of the graph, thus for each node i in C
Weak Community
C is a weak community if the total internal degree of a subgraph exceeds its total external degree
A simple version: division of network into two groups
Do you see an issue?
Ratio Cut
Ratio Cut Partitioning: Instead of minimizing the standard cut size R, we instead minimize the ratio R / (n1n2), where n1 and n2 are the sizes of the two groups. However, it biases for groups with equal sizes.
Pawan
Modularity: Random Hypothesis
Random Hypothesis: Randomly wired networks lack an inherent community structure.
How to use this?
By comparing the link density of a community with the link density obtained for the same group of nodes for a randomly rewired network, we could decide if the original community corresponds to a dense subgraph, or its connectivity pattern emerged by chance.
Formal Definition
Consider a network with N nodes and L links and a partition into nc communities, each community c having Nc nodes connected to each other by Lc links, where c =1,...,nc
We measure the difference between the network’s real wiring diagram (Aij) and the expected number of links between i and j if the network is randomly wired (pij)
What will be the expected number of links?
pij can be determined by randomizing the original network, while keeping the expected degree of each node unchanged
Modularity for the network
where Ci is the label of the community to which node i belongs to
This can be simplified further to
Modularity for the network
Maximum Modularity Hypothesis
For a given network the partition with maximum modularity corresponds to the optimal community structure.
Greedy Algorithm
Step 1. Assign each node to a community of its own, starting with N communities of single nodes.
Step 2. Inspect each community pair connected by at least one link and compute the modularity difference ∆M obtained if we merge them.
Identify the community pair for which ∆M is the largest and merge them. Note that modularity is always calculated for the full network.
Step 3. Repeat Step 2 until all nodes merge into a single community, recording M for each step.
Step 4. Select the partition for which M is maximal.
Issue with the Greedy Algorithm
Louvain Algorithm
Slide Courtesy: Jure Leskovec, Stanford CS224W
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Louvain Algorithm: At High Level
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Louvain: Phase I (Partitioning)
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Louvain: Modularity Gain
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Louvain Summary: Modularity Gain
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Lovain Phase I Summary
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Louvain 2nd Phase (Restructuring)
The communities obtained in the first phase are contracted into super-nodes, and the network is created accordingly:
Lecture 4: Complex Networks, IIT KGP, Somak Aditya
Louvain: Step 1
Louvain: Step II
Is the Louvain algorithm fast?
Modularity change computation for Louvain
The modularity change ∆M obtained by moving an isolated node i into a community C can be calculated using
∑in is the sum of the weights of the links inside C (which is LC for an unweighted network);
∑tot is the sum of the link weights of all nodes in C;
ki is the sum of the weights of the links incident to node i;
ki,in is the sum of the weights of the links from i to nodes in C and
W is the sum of the weights of all links in the network.
Modularity Change
Consider communities A and B and denote with kA and kB the total degree in these communities
What will be the change in modularity after we merge these two communities?
What does that imply?
Limits of Modularity
if we merge communities A and B into a single community, the network’s modularity changes with
Consider the case with kAkB / 2L < 1
Comes from Resolution limit of modularity: multiple small communities being grouped together into a larger community. This causes the smaller communities to be hidden;
Betweenness based methods
Betweenness of an edge
Calculating Edge Betweenness Efficiently
Evaluation of Community Structure
Standard Evaluation measures
Normalized Mutual Information (NMI)
CA denotes the number of real communities and CB denotes the number of found communities
N is a confusion matrix with rows denoting the real communities and columns “found” communities
Nij: Number of nodes in real community i that are also in found community j
Adjusted Rand Index (ARI)
Given n object set S = {o1,o2,...,on}, suppose X and Y are two partitions
Adjusted Rand Index (ARI)
Rand Index (RI) looks at four different types of pairs among nC2 pairs
Type (i) and (ii) are consider agreements while others are disagreement.
RI = (a+b)/(a+b+c+d)
a+b+c+d = nC2
Adjusted Rand Index
ARI: Example
ARI: Example
Video Based Question-Answering
1. Data Collection and annotations
2. Model training, finetuning
Summarization and Quiz Generation
1. Query-based and persona-specific video summarization
2. Personalized Question-Answer (Quiz) generation
Multilingual Answer generation, TTS, and Deployment
1. Multilingual Question-answering
2. Prototype website launch
3. Deployment
2025-26 (6M+6M)
2026-27 (6M+6M)
2027-28 (4M+4M+4M)