1 of 29

SCOREH+: A High-Order Node Proximity Spectral Clustering on Ratios-of-Eigenvectors

Yanhui Zhu, Fang Hu, Lei Hsin Kuo and Jia Liu

Department of Mathematics and Statistics

University of West Florida

Department of Computer Science

Iowa State University

yanhui@iastate.edu

2 of 29

Overview

  • Complex Networks and Community Detection
  • Spectral Clustering
  • SCOREH+
  • Experiments and Results
  • Conclusion

Yanhui Zhu

2

SCOREH+: A new spectral clustering for community detection

3 of 29

Complex Networks & Community Detection

4 of 29

Complex Network

Yanhui Zhu

4

a complex network is a graph (network) with non-trivial topological features—features that do not occur in simple networks such as lattices or random graphs but often occur in graphs modelling of real systems.

Simple/ random graph

Internet traffic routing

SCOREH+: A new spectral clustering for community detection

5 of 29

Examples: Political blogs

  • Red: conservative
  • Blue: liberal
  • Yellow: links from liberal to conservative
  • Purple: links from conservative to liberal

Yanhui Zhu

5

L. Adamic, & N. Glance, 2005

SCOREH+: A new spectral clustering for community detection

6 of 29

Examples: Biology

  • Yeast protein-protein interaction network
    • 3,438 proteins and 37,325 interactions for clarity.

Yanhui Zhu

6

V. Pancaldi, et. al., 2012

SCOREH+: A new spectral clustering for community detection

7 of 29

Community Detection

Yanhui Zhu

7

Community detection refers to the procedure of identifying groups of interacting vertices (i.e., nodes) in a network depending upon their structural properties (Yang et al., 2013; Kelley et al., 2012)

SCOREH+: A new spectral clustering for community detection

8 of 29

Spectral Clustering

9 of 29

Spectral Clustering (SC)

adjacency Matrix A, # clusters k

  1. Acquire the degree matrix D from A
  2. Compute the Laplacian matrix L=D-A
  3. Normalized Laplacian L
  4. Eigen-pair (eigenvalue, eigenvector) of L
  5. Keep first r=k corresponding eigenvectors w.r.t. the sorted eigenvectors, and form the eigenvectors as a new feature matrix
  6. Apply k-means to the new feature matrix.

Yanhui Zhu

9

SCOREH+: A new spectral clustering for community detection

10 of 29

Spectral Clustering (SC)

  • Use the adjacency matrix as the graph similarity matrix
  • Degree heterogeneity (degree power law property of complex networks)
  • Keep r=k leading eigenvectors

Yanhui Zhu

10

SCOREH+: A new spectral clustering for community detection

11 of 29

A New Algorithm SCOREH+

12 of 29

SCOREH+

  • Use high-order node proximity to preserve more information
  • Eliminate Degree heterogeneity by normalizing eigenvectors
  • Keep one more leading eigenvectors for “weak-signal” networks

Yanhui Zhu

12

SCOREH+: A new spectral clustering for community detection

13 of 29

SCOREH+

adjacency Matrix A, # clusters k

  1. Compute high-order node proximity with RBF and Katz index
  2. Acquire the degree matrix D from A
  3. Compute the Laplacian matrix L=D-A
  4. Normalized Laplacian L
  5. Eigen-pair (eigenvalue, eigenvector) of L
  6. Normalized them with the largest eigenvector.

Yanhui Zhu

13

SCOREH+: A new spectral clustering for community detection

14 of 29

SCOREH+

adjacency Matrix A, # clusters k

  1. Identify the network “weakness”(weak signal/ strong signal)
  2. If weak signal, r=k+1
  3. Keep first r=k corresponding eigenvectors w.r.t. the sorted eigenvectors, and form the eigenvectors as a new feature matrix
  4. Apply k-means to the new feature matrix.

Yanhui Zhu

14

SCOREH+: A new spectral clustering for community detection

15 of 29

Step 1: Radial Basis Functions (RBFs)

Common choices of RBFs

Weight matrix

Yanhui Zhu

15

SCOREH+: A new spectral clustering for community detection

16 of 29

Step 1: High-Order Proximity

  • Katz index

Yanhui Zhu

16

SCOREH+: A new spectral clustering for community detection

17 of 29

Step 7-8: Weak and strong signal networks

  • Example:

Yanhui Zhu

17

X-axis: No. eigenvalues

Y-axis: absolute value of eigenvalues

Left panel: Strong signal network

Right panel: weak signal network

SCOREH+: A new spectral clustering for community detection

18 of 29

Experiments

19 of 29

Evaluation Metrics

  •  

Yanhui Zhu

19

SCOREH+: A new spectral clustering for community detection

20 of 29

Evaluation Metrics

  • Modularity

    • k, the number of clusters
    • l , the total edges in the network
    • ls the sum of all edges in a cluster
    • ds degree of node s.

Yanhui Zhu

20

Newman, M. E. J. ,2006

SCOREH+: A new spectral clustering for community detection

21 of 29

Numerical Experiments

  • Real-world and Synthetic networks

Yanhui Zhu

21

We generate networks by setting the number of nodes ranging from 150 to 10, 000, and the key parameter µ (mixing parameter) from 0.15 to 0.85.

Real-world networks

*LFR synthetic networks

*A. Lancichinetti, S. Fortunato, and F. Radicchi, 2008.

SCOREH+: A new spectral clustering for community detection

22 of 29

Numerical Experiments

  • Real-world

Yanhui Zhu

22

(a): ground-truth network, (b)(c)(d) are results from SCORE, SCORE+, and our SCOREH+, respectively

SCOREH+: A new spectral clustering for community detection

23 of 29

Numerical Experiments

  • Real-world

Yanhui Zhu

23

Corresponding optimal NMI

Using SC, SCORE, and SCORE+

SCOREH+: A new spectral clustering for community detection

24 of 29

Optimal RBFs

Yanhui Zhu

24

Optimal shaping parameter c of iMQ: [0, 0.1]

SCOREH+: A new spectral clustering for community detection

25 of 29

Numerical Experiments

  • Synthetic data – Modularity

Yanhui Zhu

25

SCOREH+: A new spectral clustering for community detection

26 of 29

Numerical Experiments

  • Synthetic data – NMI

Yanhui Zhu

26

SCOREH+: A new spectral clustering for community detection

27 of 29

Conclusion

  • Our new algorithm solved three proposed weaknesses of Spectral Clustering
  • Better performance compared to SC, SCORE and SCORE+
  • Numerical results show that each RBF has a shaping parameter domain where we can achieve optimal results. This finding can help tune RBFs parameters in future wide applications.

Yanhui Zhu

27

SCOREH+: A new spectral clustering for community detection

28 of 29

Conclusion

  • Future work would reduce the time complexity and apply our algorithm to large-scale networks.
  • Moreover, finding a correlation between some matrix properties and the optimal RBF shaping parameter may facilitate optimizing the algorithm iteratively.

Yanhui Zhu

28

SCOREH+: A new spectral clustering for community detection

29 of 29

Questions