Graphs (Part 02)
Introduction to Graph Theory
Fardina Fathmiul Alam
CMSC 320 - Introduction to Data Science
2025
The Girvan-Newman algorithm
TODAY WE WILL TALK ABOUT
GIRVAN-NEWMAN ALGORITHM
The Girvan-Newman algorithm for the detection and analysis of community structure relies on the iterative elimination of edges that have the highest number of shortest paths between nodes passing through them. By removing edges from the graph one-by-one, the network breaks down into smaller pieces, so-called communities. The algorithm was introduced by Michelle Girvan and Mark Newman.
TODAY WE WILL KNOW FIRST
Before discussing about The Girvan-Newman algorithm.
COMMUNITY
Real-life graphs are not random
We expect that the nodes in a graph will be organized in communities.
How do we find them?
PROBLEM: CAN WE IDENTIFY GROUPS OF DENSELY CONNECTED NODES?
(Adapted from: Mining of Massive Datasets, http://www.mmds.org
COMMUNITIES: FOOTBALL CONFERENCES
Nodes: Football Teams,
Edges: Matches/ Game played, Communities: Conferences
COMMUNITY DETECTION
9
A co-authorship network of physicists and mathematicians
(Courtesy: Easley & Kleinberg)
How can the tightly clustered groups be identified?
WHAT IS A COMMUNITY?
Informally: “tightly-knit region” of the network.
It depends on the definition of tightly knit.
10
RECAP: CENTRALITY
EDGE IMPORTANCE
Betweenness of edge (𝑎,𝑏) (𝐵(𝑎,𝑏)): For each pair of nodes 𝑥,𝑦 compute the number of shortest paths that include (𝑎,𝑏)
The edge betweenness centrality (EBC) can be defined as the number of shortest paths that pass through an edge (more specifically passing through the endpoints of the edge) in a network.
WHAT IS A COMMUNITY?
13
(Courtesy: Easley & Kleinberg)
An example of a nested structure of the communities
bridges
Removal of a bridge separates the graph into disjoint components
Example: Practical Applications of Community Detection
Social Networks: Identifies individuals with similar interests; Helps understand social dynamics and group behavior.
Epidemiology: Recognizes clusters to prevent disease spread. Vital for public health strategies.
Internet Communities: Groups pages with similar content. Organizes online information effectively. Enables insights into diverse networks.
COMMUNITY DETECTION
Girvan-Newman Method [Newman and Girvan, 2004]
As the graph breaks down into pieces, the tightly knit community structure is exposed.
Results in a hierarchical partitioning of the graph (Gives a hierarchical decomposition of the graph into communities).
15
Michelle Girvan
GIRVAN-NEWMAN METHOD
GIRVAN-NEWMAN METHOD
GIRVAN-NEWMAN METHOD
20
Betweenness(7-8)= 7*7 = 49
How?
First, Calculate Edge Betweenness
7
8
3
2
1
11
9
10
6
4
5
13
12
14
Edge Betweenness Example
Calculate total flow over edge 7-8
Edge 7-8 in the graph here carries info from each of the 7 nodes on the left (incl. node 7) and the 7 nodes on the right (incl. node 8) # shortest paths through Edge 7-8 is 7*7 = 49
7
8
3
2
1
11
9
10
6
4
5
13
12
14
One unit flows over 7-8 to get from 1 to 8
7
8
3
2
1
11
9
10
6
4
5
13
12
14
One unit flows over 7-8 to get from 1 to 9
7
8
3
2
1
11
9
10
6
4
5
13
12
14
One unit flows over 7-8 to get from 1 to 10
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7 total units flow over 7-8 to get from 1 to nodes 8-14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7 total units flow over 7-8 to get from 2 to nodes 8-14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7 total units flow over 7-8 to get from 3 to nodes 8-14 and so on..
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7 x 7 = 49 total units flow over 7-8 from nodes 1-7 to 8-14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
Edge betweenness = 49
7
8
3
2
1
11
9
10
6
4
5
13
12
14
Back to: GIRVAN-NEWMAN METHOD
30
Betweenness(7-8)= 7*7 = 49
GIRVAN-NEWMAN METHOD
31
Betweenness(7-8)= 7*7 = 49
Betweenness(3-7) = Betweenness(6-7) = Betweenness(8-9) = Betweenness(8-12) = ?
Betweenness(1-3) = ?
Which edge is having highest betweenness?
3*11= 33
1*12 = 12
GIRVAN-NEWMAN METHOD
33
Betweenness(1, 3) = 1X5=5
Betweenness(3,7) = Betweenness(6,7) = Betweenness(8,9) = Betweenness(8,12) = 3X4=12
Removed
Next, Remove the edge having highest Edge Betweenness
GIRVAN-NEWMAN METHOD
34
Why is Betweenness now??
Betweenness of every edge = 1
Repeat the Process
GIRVAN-NEWMAN METHOD
35
G=nx.Graph()
# Returns an iterator over partitions at
# different hierarchy levels
nx.girvan_newman(G)
ALL TOGETHER: GIRVAN-NEWMAN METHOD
36
A SAMPLE OF HOW HIERARCHICAL RESULTS OF GIRMAN-NEWMAN
Results in a hierarchical partitioning of the graph.
GIRVAN-NEWMAN METHOD
39
The method gives us only a succession of splits of the network into smaller and smaller communities, but it gives no indication of which splits are best.
One way to find the best split is via the the modularity concept.
An Improvement: Introducing Modularity
“Modularity” - a helpful metric to assess the strength of the community structure found.
Modularity can be represented and calculated in different ways depending on the specific community detection algorithm. We will focus on “Modularity Score Q”.
Modularity
Modularity quantifies the difference between the number of edges within modules and the expected number of edges if the edges were distributed randomly. A higher modularity value indicates a better community structure, where there are more edges within modules and fewer between them.
An Improvement: Introducing Modularity
After removing an edge, the Girvan-Newman algorithm can be used to calculate the modularity (Q) of the graph.
A higher value suggests a more significant community structure.
A modularity of 0 suggests that this particular division into two communities does not reflect a strong community structure
NETWORKX: VIZ
Can render via Matplotlib or GraphViz
Many different layout engines, aesthetic options, etc
45
import matplotlib.pyplot as plt
G=nx.Graph()
nx.draw(G, with_labels=True)
# Save to a PDF
plt.savefig(“my_filename.pdf”)
NETWORKX: VIZ
46
# Cycle with 24 vertices
G=nx.cycle_graph(24)
# Compute force-based layout
pos=nx.spring_layout(G,
iterations=200)
# Draw the graph
nx.draw(G,pos,
node_color=range(24),
node_size=800,
cmap=plt.cm.Blues)
# Save as PNG, then display
plt.savefig(”graph.png")
plt.show()
NETWORKX: VIZ
47
# Branch factor 3, depth 5
G = nx.balanced_tree(3, 5)
# Circular layout
pos = graphviz_layout(G,
prog='twopi', args='')
# Draw 8x8 figure
plt.figure(figsize=(8, 8))
nx.draw(G, pos,
node_size=20,
alpha=0.5,
node_color="blue",
with_labels=False)
plt.axis('equal')
plt.show()
Class Notes:
WILL added later
Additional Reading Slides
The Girvan-Newman method for graph partition [Newman and Girvan, 2004]
Step 1: Identify and remove the edge or edges with the highest betweenness centrality from the graph. This might cause the graph to split into separate parts, marking the initial level of regions in the graph's partitioning.
Step 2: Recalculate betweenness centrality for all edges and remove the edge or edges with the highest betweenness. This might further break down existing components into smaller ones, revealing nested regions within larger ones.
Step 3: Repeat this process iteratively until no edges remain in the graph. At each step, recalculate betweenness centrality for all edges and remove the edge or edges with the highest betweenness. This progressively reveals the hierarchical structure of regions within the graph.
7 x 7 = 49 total units flow over 7-8 from nodes 1-7 to 8-14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
Edge betweenness = 49
7
8
3
2
1
11
9
10
6
4
5
13
12
14
GIRVAN-NEWMAN METHOD
Calculate betweenness for edge 3-7
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
3 units flow from �1-3 to each 4-14 node,
so total = �3 x 11 = 33
Betweenness = 33
for each �symmetric edge
33
33
33
33
7
8
3
2
1
11
9
10
6
4
5
13
12
14
Calculate betweenness for edge 1-3
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
Carries all flow to node 1 except from node 2,
so betweenness = 12
7
8
3
2
1
11
9
10
6
4
5
13
12
14
betweenness = 12
for each �symmetric edge
12
12
12
12
12
12
12
12
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
Calculate betweenness for edge 1-2
7
8
3
2
1
11
9
10
6
4
5
13
12
14
3
2
1
Only carries flow �from 1 to 2, so �betweenness = 1
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
betweenness = 1�for each symmetric edge
1
1
1
1
7
8
3
2
1
11
9
10
6
4
5
13
12
14
7
8
3
2
1
11
9
10
6
4
5
13
12
14
Edge with highest betweenness
1
1
1
1
12
12
12
12
12
12
12
12
33
33
33
33
49
7
8
3
2
1
11
9
10
6
4
5
13
12
14