1 of 10

COMMUNITY STRUCTURE

Another area of recent progress has been in the identification of community structure—an intermediate scale of analysis between local (e.g., clustering, network motifs) and global (e.g., connectivity, path lengths) structure. Standard approaches to identifying community structure have tended to rely on some version of hierarchical clustering (a method for partitioning the network into increasingly similar subsets of nodes).

2 of 10

Basics

What is a community?

    • Intuitively: densely connected subnetworks

Why is it interesting?

    • Nodes that participate in same function/nodes with similar attributes form groups and these groups are represented in network structure.

Challenges:

    • No single clear definition. Many competing options.
    • Large networks, different features. Even more algorithms.
    • Which one to choose?
    • How to evaluate performance of method?

Communities are of interest in many cases

    • World Wide Web, Citation networks, Social networks, Metabolic networks, genetic networks
    • Properties of communities may be quite different from average properties of a network

3 of 10

  1. Each node in its own community.
  2. Calculate a distance between pairs of communities.
  3. Join closest pair.
  4. Go to step 2.

Hierarchical Clustering

4 of 10

Distance

Node distance should be low if nodes are in a community. A distance metric is a function that defines a distance between two observations. 

MATLAB

pdist supports various distance metrics : Euclidean distance, standardized Euclidean distance, Mahalanobis distance, city block distance, ……..

Algorithm Description

  1. Find the similarity or dissimilarity between every pair of objects in the data set. In this step, you calculate the distance between objects using the pdist function. The pdist function supports many different ways to compute this measurement.
  2. Group the objects into a binary, hierarchical cluster tree. In this step, you link pairs of objects that are in close proximity using the linkage function. The linkage function uses the distance information generated in step 1 to determine the proximity of objects to each other. As objects are paired into binary clusters, the newly formed clusters are grouped into larger clusters until a hierarchical tree is formed.
  3. Determine where to cut the hierarchical tree into clusters. In this step, you use the cluster function to prune branches off the bottom of the hierarchical tree, and assign all the objects below each cut to a single cluster. This creates a partition of the data. The cluster function can create these clusters by detecting natural groupings in the hierarchical tree or by cutting off the hierarchical tree at an arbitrary point.

Hierarchical Clustering

5 of 10

>> X = [1 2;2.5 4.5;2 2;4 1.5;...

4 2.5];

>> Y = pdist(X);

Y =

2.9155 1.0000 3.0414 3.0414 2.5495 3.3541 2.5000 2.0616 2.0616 1.0000

>> squareform(Y)

0 2.9155 1.0000 3.0414 3.0414

2.9155 0 2.5495 3.3541 2.5000

1.0000 2.5495 0 2.0616 2.0616

3.0414 3.3541 2.0616 0 1.0000

3.0414 2.5000 2.0616 1.0000 0

>> Z = linkage(Y)

Z =

4.0000 5.0000 1.0000

1.0000 3.0000 1.0000

6.0000 7.0000 2.0616

2.0000 8.0000 2.5000

Hierarchical Clustering

6 of 10

Dendrogram

1, 3, 4, and 5, belong to one cluster, 2belongs to the other cluster.

 4 and 5 in one cluster, 1 and 3 in a second cluster, and 2 in a third cluster. 

Graphically illustrates the way linkage groups the objects into a hierarchy of clusters

Hierarchical Clustering

7 of 10

Advantages:

    • Easy to understand
    • Easy to implement

Disadvantages:

    • Slow, number of steps to evaluate: depending on linkage
    • Tends to group together those nodes with the strongest connections
    • but leave out those with weaker connections. Divisions consist of a
    • few dense cores surrounded by a periphery of unattached nodes
    • Results depend on distance and linkage

Open question: where to cut the dendrogram?

Hierarchical Clustering

8 of 10

Modularity

 

9 of 10

Modularity

 

10 of 10

Summary: Modularity

  • Modularity is widely used as a measure for how good a clustering is.
  • Particularly popular in social network analysis, but used in other contexts as well (e.g. Brain networks).