COMMUNITY STRUCTURE
Another area of recent progress has been in the identification of community structure—an intermediate scale of analysis between local (e.g., clustering, network motifs) and global (e.g., connectivity, path lengths) structure. Standard approaches to identifying community structure have tended to rely on some version of hierarchical clustering (a method for partitioning the network into increasingly similar subsets of nodes).
Basics
What is a community?
Why is it interesting?
Challenges:
Communities are of interest in many cases
Hierarchical Clustering
Distance
Node distance should be low if nodes are in a community. A distance metric is a function that defines a distance between two observations.
MATLAB
pdist supports various distance metrics : Euclidean distance, standardized Euclidean distance, Mahalanobis distance, city block distance, ……..
Algorithm Description
Hierarchical Clustering
>> X = [1 2;2.5 4.5;2 2;4 1.5;...
4 2.5];
>> Y = pdist(X);
Y =
2.9155 1.0000 3.0414 3.0414 2.5495 3.3541 2.5000 2.0616 2.0616 1.0000
>> squareform(Y)
0 2.9155 1.0000 3.0414 3.0414
2.9155 0 2.5495 3.3541 2.5000
1.0000 2.5495 0 2.0616 2.0616
3.0414 3.3541 2.0616 0 1.0000
3.0414 2.5000 2.0616 1.0000 0
>> Z = linkage(Y)
Z =
4.0000 5.0000 1.0000
1.0000 3.0000 1.0000
6.0000 7.0000 2.0616
2.0000 8.0000 2.5000
Hierarchical Clustering
Dendrogram
1, 3, 4, and 5, belong to one cluster, 2belongs to the other cluster.
4 and 5 in one cluster, 1 and 3 in a second cluster, and 2 in a third cluster.
Graphically illustrates the way linkage groups the objects into a hierarchy of clusters
Hierarchical Clustering
Advantages:
Disadvantages:
Open question: where to cut the dendrogram?
Hierarchical Clustering
Modularity
Modularity
Summary: Modularity