1 of 51

Community Detection in Networks

January 15th, 2026

CS60078

2 of 51

An Example: Communities in Belgium

3 of 51

Communities in Social Networks

  • The employees of a company are more likely to interact with their coworkers than with employees of other companies. Consequently work places appear as densely interconnected communities within the social network.
  • Communities could also represent circles of friends, or a group of individuals who pursue the same hobby together, or individuals living in the same neighborhood.

Zachary’s Karate Club

4 of 51

Defining Communities

Communities are locally dense connected subgraphs in a network. This expectation relies on two distinct hypotheses:

Connectedness Hypothesis

Each community corresponds to a connected subgraph, like the subgraphs formed by the orange, green or the purple nodes.

Density Hypothesis

Nodes in a community are more likely to connect to other members of the same community than to nodes in other communities.

5 of 51

Defining Communities

  • Network communities are group of vertices such that vertices inside of the group connected with many more edges than between groups.
  • What makes a community?
    • Mutuality of ties: Almost everyone in the group has ties to each other
    • Compactness: closeness or reachability of group in small number of steps
    • Density of edges
    • Separation: Higher frequency of ties among group members than non-group members.

6 of 51

Maximal Cliques

  • One of the first papers on community structure, published in 1949, defined a community as group of individuals whose members all know each other.
  • In graph theoretic terms this means that a community is a complete subgraph, or a clique.

Any drawbacks of this notion?

  • While triangles are frequent in networks, larger cliques are rare.
  • Requiring a community to be a complete subgraph may be too restrictive, missing many other legitimate communities.

7 of 51

Formally

8 of 51

9 of 51

Strong vs Weak Community

Strong Community

C is a strong community if each node within C has more links within the community than with the rest of the graph, thus for each node i in C

Weak Community

C is a weak community if the total internal degree of a subgraph exceeds its total external degree

10 of 51

A simple version: division of network into two groups

Do you see an issue?

11 of 51

Ratio Cut

Ratio Cut Partitioning: Instead of minimizing the standard cut size R, we instead minimize the ratio R / (n1n2), where n1 and n2 are the sizes of the two groups. However, it biases for groups with equal sizes.

Pawan

12 of 51

Modularity: Random Hypothesis

Random Hypothesis: Randomly wired networks lack an inherent community structure.

  • In a randomly wired network the connection pattern between the nodes is expected to be uniform, independent of the network's degree distribution.
  • Consequently these networks are not expected to display systematic local density fluctuations that we could interpret as communities.

How to use this?

By comparing the link density of a community with the link density obtained for the same group of nodes for a randomly rewired network, we could decide if the original community corresponds to a dense subgraph, or its connectivity pattern emerged by chance.

13 of 51

Formal Definition

Consider a network with N nodes and L links and a partition into nc communities, each community c having Nc nodes connected to each other by Lc links, where c =1,...,nc

We measure the difference between the network’s real wiring diagram (Aij) and the expected number of links between i and j if the network is randomly wired (pij)

14 of 51

What will be the expected number of links?

pij can be determined by randomizing the original network, while keeping the expected degree of each node unchanged

15 of 51

16 of 51

Modularity for the network

where Ci is the label of the community to which node i belongs to

This can be simplified further to

 

17 of 51

Modularity for the network

18 of 51

Maximum Modularity Hypothesis

For a given network the partition with maximum modularity corresponds to the optimal community structure.

  • The maximum modularity hypothesis is the starting point of several community detection algorithms, each seeking the partition with the largest modularity.
  • In principle we could identify the best partition by checking M for all possible partitions, selecting the one for which M is largest.
  • Given, however, the exceptionally large number of partitions, this brute-force approach is computationally not feasible.

19 of 51

Greedy Algorithm

Step 1. Assign each node to a community of its own, starting with N communities of single nodes.

Step 2. Inspect each community pair connected by at least one link and compute the modularity difference ∆M obtained if we merge them.

Identify the community pair for which ∆M is the largest and merge them. Note that modularity is always calculated for the full network.

Step 3. Repeat Step 2 until all nodes merge into a single community, recording M for each step.

Step 4. Select the partition for which M is maximal.

20 of 51

Issue with the Greedy Algorithm

  • The O(N2) computational complexity of the greedy algorithm can be prohibitive for very large networks.
  • A modularity optimization algorithm with better scalability was proposed by Blondel and collaborators

21 of 51

Louvain Algorithm

Slide Courtesy: Jure Leskovec, Stanford CS224W

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

22 of 51

Louvain Algorithm: At High Level

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

23 of 51

24 of 51

Louvain: Phase I (Partitioning)

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

25 of 51

Louvain: Modularity Gain

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

26 of 51

 

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

27 of 51

 

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

28 of 51

 

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

29 of 51

 

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

30 of 51

Louvain Summary: Modularity Gain

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

31 of 51

Lovain Phase I Summary

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

32 of 51

Louvain 2nd Phase (Restructuring)

The communities obtained in the first phase are contracted into super-nodes, and the network is created accordingly:

  • Super-nodes are connected if there is at least one edge between the nodes of the corresponding communities
  • The weight of the edge between the two supernodes is the sum of the weights from all edges between their corresponding communities
  • Phase 1 is then run on the super-node network

Lecture 4: Complex Networks, IIT KGP, Somak Aditya

33 of 51

34 of 51

Louvain: Step 1

  • Start with a weighted network of N nodes, initially assigning each node to a different community.
  • For each node i we evaluate the gain in modularity if we place node i in the community of one of its neighbors j.
  • We then move node i in the community for which the modularity gain is the largest, but only if this gain is positive. If no positive gain is found, i stays in its original community.
  • This process is applied to all nodes until no further improvement can be achieved, completing Step I.

35 of 51

Louvain: Step II

  • We construct a new network whose nodes are the communities identified during Step I.
  • The weight of the link between two nodes is the sum of the weight of the links between the nodes in the corresponding communities. Links between nodes of the same community lead to weighted self-loops.
  • Once Step II is completed, we repeat Steps I - II, calling their combination a pass. The number of communities decreases with each pass. The passes are repeated until there are no more changes and maximum modularity is attained.

36 of 51

Is the Louvain algorithm fast?

  • Complexity is linear for sparse data
  • Gains in modularity can be computed by a simple formula
  • Number of communities decreases drastically after just a few passes, so most of the running time is concentrated on the first iterations

37 of 51

Modularity change computation for Louvain

The modularity change ∆M obtained by moving an isolated node i into a community C can be calculated using

in is the sum of the weights of the links inside C (which is LC for an unweighted network);

tot is the sum of the link weights of all nodes in C;

ki is the sum of the weights of the links incident to node i;

ki,in is the sum of the weights of the links from i to nodes in C and

W is the sum of the weights of all links in the network.

38 of 51

Modularity Change

Consider communities A and B and denote with kA and kB the total degree in these communities

What will be the change in modularity after we merge these two communities?

What does that imply?

39 of 51

Limits of Modularity

if we merge communities A and B into a single community, the network’s modularity changes with

Consider the case with kAkB / 2L < 1

Comes from Resolution limit of modularity: multiple small communities being grouped together into a larger community. This causes the smaller communities to be hidden;

40 of 51

Betweenness based methods

41 of 51

Betweenness of an edge

42 of 51

43 of 51

Calculating Edge Betweenness Efficiently

44 of 51

Evaluation of Community Structure

Standard Evaluation measures

  • Normalized Mutual Information (NMI)
  • Adjusted Rand Index (ARI)
  • Purity (PU)

45 of 51

Normalized Mutual Information (NMI)

CA denotes the number of real communities and CB denotes the number of found communities

N is a confusion matrix with rows denoting the real communities and columns “found” communities

Nij: Number of nodes in real community i that are also in found community j

 

46 of 51

Adjusted Rand Index (ARI)

Given n object set S = {o1,o2,...,on}, suppose X and Y are two partitions

47 of 51

Adjusted Rand Index (ARI)

Rand Index (RI) looks at four different types of pairs among nC2 pairs

Type (i) and (ii) are consider agreements while others are disagreement.

RI = (a+b)/(a+b+c+d)

a+b+c+d = nC2

48 of 51

Adjusted Rand Index

49 of 51

ARI: Example

50 of 51

ARI: Example

51 of 51

Video Based Question-Answering

1. Data Collection and annotations

2. Model training, finetuning

Summarization and Quiz Generation

1. Query-based and persona-specific video summarization

2. Personalized Question-Answer (Quiz) generation

Multilingual Answer generation, TTS, and Deployment

1. Multilingual Question-answering

2. Prototype website launch

3. Deployment

2025-26 (6M+6M)

2026-27 (6M+6M)

2027-28 (4M+4M+4M)