1 of 63

Graphs (Part 02)

Introduction to Graph Theory

Fardina Fathmiul Alam

CMSC 320 - Introduction to Data Science

2025

The Girvan-Newman algorithm

2 of 63

TODAY WE WILL TALK ABOUT

GIRVAN-NEWMAN ALGORITHM

The Girvan-Newman algorithm for the detection and analysis of community structure relies on the iterative elimination of edges that have the highest number of shortest paths between nodes passing through them. By removing edges from the graph one-by-one, the network breaks down into smaller pieces, so-called communities. The algorithm was introduced by Michelle Girvan and Mark Newman.

3 of 63

TODAY WE WILL KNOW FIRST

  • What is “Community”
  • How to remove/ eliminate edges that have the highest number of shortest paths - “Edge Betweenness”

Before discussing about The Girvan-Newman algorithm.

4 of 63

COMMUNITY

Real-life graphs are not random

  • E.g., in a social network people pick their friends based on their common interests and activities.

We expect that the nodes in a graph will be organized in communities.

  • Groups of vertices which probably share common properties and/or play similar roles within the graph

How do we find them?

  • A subset of nodes in communities will be densely connected to each other, and sparsely (loosely) connected to the nodes in other communities in the same graph.
  • Sounds familiar?

5 of 63

PROBLEM: CAN WE IDENTIFY GROUPS OF DENSELY CONNECTED NODES?

(Adapted from: Mining of Massive Datasets, http://www.mmds.org

6 of 63

COMMUNITIES: FOOTBALL CONFERENCES

Nodes: Football Teams,

Edges: Matches/ Game played, Communities: Conferences

7 of 63

8 of 63

9 of 63

COMMUNITY DETECTION

9

A co-authorship network of physicists and mathematicians

(Courtesy: Easley & Kleinberg)

How can the tightly clustered groups be identified?

10 of 63

WHAT IS A COMMUNITY?

Informally: “tightly-knit region” of the network.

  • How do we identify this region?
  • How do we separate tightly-knit regions from each other?

It depends on the definition of tightly knit.

  • Regions can be nested
  • Examples ?????????
  • How do bridges fit into this ?????????

10

11 of 63

RECAP: CENTRALITY

  • Betweenness: Number of shortest paths through a node or a edge
  • Closeness: Average distance to other nodes
  • Degree: Number of connections to other nodes

12 of 63

EDGE IMPORTANCE

Betweenness of edge (𝑎,𝑏) (𝐵(𝑎,𝑏)): For each pair of nodes 𝑥,𝑦 compute the number of shortest paths that include (𝑎,𝑏)

The edge betweenness centrality (EBC) can be defined as the number of shortest paths that pass through an edge (more specifically passing through the endpoints of the edge) in a network.

13 of 63

WHAT IS A COMMUNITY?

13

(Courtesy: Easley & Kleinberg)

An example of a nested structure of the communities

bridges

Removal of a bridge separates the graph into disjoint components

14 of 63

Example: Practical Applications of Community Detection

Social Networks: Identifies individuals with similar interests; Helps understand social dynamics and group behavior.

Epidemiology: Recognizes clusters to prevent disease spread. Vital for public health strategies.

Internet Communities: Groups pages with similar content. Organizes online information effectively. Enables insights into diverse networks.

15 of 63

COMMUNITY DETECTION

Girvan-Newman Method [Newman and Girvan, 2004]

  • Remove the edges of highest betweenness first.
  • Repeat the same step with the remainder graph.
  • Continue this until the graph breaks down into predefined or data-driven numbers of communities.

As the graph breaks down into pieces, the tightly knit community structure is exposed.

Results in a hierarchical partitioning of the graph (Gives a hierarchical decomposition of the graph into communities).

  • So if we gradually remove the edge with the highest edge betweenness score we will get a hierarchical map, a rooted tree, called a dendrogram of the graph.
  • The leafs of the tree are the individual vertices The root of the tree represents the whole graph

15

Michelle Girvan

16 of 63

GIRVAN-NEWMAN METHOD

17 of 63

GIRVAN-NEWMAN METHOD

18 of 63

19 of 63

20 of 63

GIRVAN-NEWMAN METHOD

20

Betweenness(7-8)= 7*7 = 49

How?

First, Calculate Edge Betweenness

21 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

Edge Betweenness Example

Calculate total flow over edge 7-8

Edge 7-8 in the graph here carries info from each of the 7 nodes on the left (incl. node 7) and the 7 nodes on the right (incl. node 8) # shortest paths through Edge 7-8 is 7*7 = 49

22 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

One unit flows over 7-8 to get from 1 to 8

23 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

One unit flows over 7-8 to get from 1 to 9

24 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

One unit flows over 7-8 to get from 1 to 10

7

8

3

2

1

11

9

10

6

4

5

13

12

14

25 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

7 total units flow over 7-8 to get from 1 to nodes 8-14

7

8

3

2

1

11

9

10

6

4

5

13

12

14

26 of 63

7 total units flow over 7-8 to get from 2 to nodes 8-14

7

8

3

2

1

11

9

10

6

4

5

13

12

14

27 of 63

7 total units flow over 7-8 to get from 3 to nodes 8-14 and so on..

7

8

3

2

1

11

9

10

6

4

5

13

12

14

28 of 63

7 x 7 = 49 total units flow over 7-8 from nodes 1-7 to 8-14

7

8

3

2

1

11

9

10

6

4

5

13

12

14

29 of 63

Edge betweenness = 49

7

8

3

2

1

11

9

10

6

4

5

13

12

14

30 of 63

Back to: GIRVAN-NEWMAN METHOD

30

Betweenness(7-8)= 7*7 = 49

31 of 63

GIRVAN-NEWMAN METHOD

31

Betweenness(7-8)= 7*7 = 49

Betweenness(3-7) = Betweenness(6-7) = Betweenness(8-9) = Betweenness(8-12) = ?

Betweenness(1-3) = ?

Which edge is having highest betweenness?

3*11= 33

1*12 = 12

32 of 63

33 of 63

GIRVAN-NEWMAN METHOD

33

Betweenness(1, 3) = 1X5=5

Betweenness(3,7) = Betweenness(6,7) = Betweenness(8,9) = Betweenness(8,12) = 3X4=12

Removed

Next, Remove the edge having highest Edge Betweenness

34 of 63

GIRVAN-NEWMAN METHOD

34

Why is Betweenness now??

Betweenness of every edge = 1

Repeat the Process

35 of 63

GIRVAN-NEWMAN METHOD

35

G=nx.Graph()

# Returns an iterator over partitions at

# different hierarchy levels

nx.girvan_newman(G)

36 of 63

ALL TOGETHER: GIRVAN-NEWMAN METHOD

36

37 of 63

A SAMPLE OF HOW HIERARCHICAL RESULTS OF GIRMAN-NEWMAN

Results in a hierarchical partitioning of the graph.

  • So if we gradually remove the edge with the highest edge betweenness score we will get a hierarchical map, a rooted tree, called a dendrogram of the graph.
  • The leafs of the tree are the individual vertices The root of the tree represents the whole graph

38 of 63

39 of 63

GIRVAN-NEWMAN METHOD

39

The method gives us only a succession of splits of the network into smaller and smaller communities, but it gives no indication of which splits are best.

  • There can be a problem of “Over splitting”

One way to find the best split is via the the modularity concept.

  • Modularity is a measure of the strength of division of a network into modules or communities.

40 of 63

An Improvement: Introducing Modularity

“Modularity” - a helpful metric to assess the strength of the community structure found.

  • We can use Modularity to evaluate community structures.

Modularity can be represented and calculated in different ways depending on the specific community detection algorithm. We will focus on “Modularity Score Q”.

41 of 63

Modularity

Modularity quantifies the difference between the number of edges within modules and the expected number of edges if the edges were distributed randomly. A higher modularity value indicates a better community structure, where there are more edges within modules and fewer between them.

42 of 63

An Improvement: Introducing Modularity

After removing an edge, the Girvan-Newman algorithm can be used to calculate the modularity (Q) of the graph.

A higher value suggests a more significant community structure.

  • modularity values typically range from -1 to 1, with values closer to 1 indicating a strong division. In practice, the Girvan-Newman algorithm would iteratively remove edges and recalculate modularity to find a division that maximizes Q.

43 of 63

44 of 63

A modularity of 0 suggests that this particular division into two communities does not reflect a strong community structure

45 of 63

NETWORKX: VIZ

Can render via Matplotlib or GraphViz

Many different layout engines, aesthetic options, etc

45

import matplotlib.pyplot as plt

G=nx.Graph()

nx.draw(G, with_labels=True)

# Save to a PDF

plt.savefig(“my_filename.pdf”)

46 of 63

NETWORKX: VIZ

46

# Cycle with 24 vertices

G=nx.cycle_graph(24)

# Compute force-based layout

pos=nx.spring_layout(G,

iterations=200)

# Draw the graph

nx.draw(G,pos,

node_color=range(24),

node_size=800,

cmap=plt.cm.Blues)

# Save as PNG, then display

plt.savefig(”graph.png")

plt.show()

47 of 63

NETWORKX: VIZ

47

# Branch factor 3, depth 5

G = nx.balanced_tree(3, 5)

# Circular layout

pos = graphviz_layout(G,

prog='twopi', args='')

# Draw 8x8 figure

plt.figure(figsize=(8, 8))

nx.draw(G, pos,

node_size=20,

alpha=0.5,

node_color="blue",

with_labels=False)

plt.axis('equal')

plt.show()

48 of 63

Class Notes:

WILL added later

49 of 63

Additional Reading Slides

50 of 63

The Girvan-Newman method for graph partition [Newman and Girvan, 2004]

Step 1: Identify and remove the edge or edges with the highest betweenness centrality from the graph. This might cause the graph to split into separate parts, marking the initial level of regions in the graph's partitioning.

Step 2: Recalculate betweenness centrality for all edges and remove the edge or edges with the highest betweenness. This might further break down existing components into smaller ones, revealing nested regions within larger ones.

Step 3: Repeat this process iteratively until no edges remain in the graph. At each step, recalculate betweenness centrality for all edges and remove the edge or edges with the highest betweenness. This progressively reveals the hierarchical structure of regions within the graph.

51 of 63

52 of 63

7 x 7 = 49 total units flow over 7-8 from nodes 1-7 to 8-14

7

8

3

2

1

11

9

10

6

4

5

13

12

14

53 of 63

Edge betweenness = 49

7

8

3

2

1

11

9

10

6

4

5

13

12

14

GIRVAN-NEWMAN METHOD

54 of 63

Calculate betweenness for edge 3-7

7

8

3

2

1

11

9

10

6

4

5

13

12

14

55 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

3 units flow from �1-3 to each 4-14 node,

so total = �3 x 11 = 33

56 of 63

Betweenness = 33

for each �symmetric edge

33

33

33

33

7

8

3

2

1

11

9

10

6

4

5

13

12

14

57 of 63

Calculate betweenness for edge 1-3

7

8

3

2

1

11

9

10

6

4

5

13

12

14

58 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

Carries all flow to node 1 except from node 2,

so betweenness = 12

59 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

betweenness = 12

for each �symmetric edge

12

12

12

12

12

12

12

12

7

8

3

2

1

11

9

10

6

4

5

13

12

14

60 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

Calculate betweenness for edge 1-2

7

8

3

2

1

11

9

10

6

4

5

13

12

14

61 of 63

3

2

1

Only carries flow �from 1 to 2, so �betweenness = 1

7

8

3

2

1

11

9

10

6

4

5

13

12

14

62 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

betweenness = 1�for each symmetric edge

1

1

1

1

7

8

3

2

1

11

9

10

6

4

5

13

12

14

63 of 63

7

8

3

2

1

11

9

10

6

4

5

13

12

14

Edge with highest betweenness

1

1

1

1

12

12

12

12

12

12

12

12

33

33

33

33

49

7

8

3

2

1

11

9

10

6

4

5

13

12

14