1 of 46

Hierarchical Core Decomposition in Parallel:�From Construction to Subgraph Search

  • Deming Chu, Fan Zhang, Wenjie Zhang, Xuemin Lin, Ying Zhang

1

2 of 46

Core Decomposition

  • Powerful Tool in Network Analysis
  • Decompose a graph into layers

2

3 of 46

Limitations of Core Decomposition

3

  1. Connectivity of k-cores is lost
  2. Containment of k-cores is lost

 

 

2.

 

 

 

 

4 of 46

1. Hierarchical Core Decomposition (HCD)

  •  

4

5 of 46

Applications of HCD

  •  

5

6 of 46

2. Subgraph Search on HCD

  • Find high-quality subgraphs on HCD
  • Community scoring metric to evaluate subgraph quality
    • E.g. average degree, clustering coefficient

Work-Efficient�Parallel Algorithms

6

Score=0.8

Score=0.6

Work-Efficient: #steps matches the best serial time complexity

Score=0.5

Score=0.4

7 of 46

Applications of Subgraph Search on HCD

  •  

7

[1] D. Chu, F. Zhang, X. Lin, W. Zhang, Y. Zhang, Y. Xia, and C. Zhang, “Finding the best k in core decomposition: A time and space optimal solution,” in ICDE. IEEE, 2020, pp. 685–696.

8 of 46

Roadmap

  • Problem Definition
    • HCD Construction
    • Subgraph Search on HCD
  • Parallel Construction of HCD
  • Parallel Subgraph Search on HCD
  • Experimental Results
  • Conclusion

8

9 of 46

K-Core and Core Decomposition

  •  

9

Nodes with coreness 3

Nodes with coreness 2

Nodes with coreness 1

10 of 46

1. Hierarchical Core Decomposition

  •  

10

 

 

 

11 of 46

Definition: Community Scoring Metric

Primary Value (for a subgraph 𝑺)

Most metrics are based on Primary Values

  • 𝒏(𝑺): # of nodes
  • 𝒎(𝑺): # of edges
  • 𝒃(𝑺): # of boundary edges
  • 𝜟(𝑺): # of triangles
  • 𝒕(𝑺): # of triplets

Community Scoring Metric (for a subgraph 𝑺)

To measure community quality

  •  

11

 

Type-A

Type-B

Higher-Order

12 of 46

2. Subgraph Search on HCD

  •  

12

 

 

 

 

13 of 46

Roadmap

  • Problem Definition
  • Parallel Construction of HCD
  • Parallel Subgraph Search on HCD
  • Experimental Results
  • Conclusion

13

14 of 46

Existing Works of HCD Construction

  •  

14

[1] D. W. Matula and L. L. Beck, “Smallest-last ordering and clustering and graph coloring algorithms,” J. ACM, vol. 30, no. 3, pp. 417–427, 1983.

15 of 46

Intuition of Our Solution

  • Add vertices to union-find in decreasing coreness
  • Meanwhile, build HCD from bottom to up
  • Use pivot to identify tree node and its parent

15

16 of 46

Vertex Rank and Pivot

  • Vertex Rank: sort nodes by coreness, and break tie by id
  • Pivot: the vertex with the lowest vertex rank in a subgraph

16

 

pivot

Nodes with coreness 3

Nodes with coreness 2

Nodes with coreness 1

17 of 46

Vertex Rank and Pivot

1. Group Vertices

2. Find Parent

17

pivot

old pivot

pivot

18 of 46

Our Solution

  •  

18

19 of 46

Our Solution

  •  

19

pivot

old pivot

add to the graph

20 of 46

Our Solution

  •  

20

 

pivot

old pivot

 

21 of 46

Our Solution

  •  

21

2. Union with neighbors in union-find and pivot is changed

pivot

old pivot

 

22 of 46

Our Solution

  •  

22

3. Group by pivot into tree node

pivot

old pivot

 

23 of 46

Our Solution

  •  

23

4. In the same CC, the tree node containing and have parent-child relation

pivot

old pivot

 

24 of 46

Union-find with Pivot

  •  

24

 

25 of 46

Parallel Vertex Rank Computation

  •  

25

2. Group vertices by coreness

 

1. Distribute vertices

is parallel

……

……

 

 

……

……

3. Place back into array, obtain vertex rank

26 of 46

PHCD Analysis

  •  

26

 

27 of 46

Roadmap

  • Problem Definition
  • Parallel Construction of HCD
  • Parallel Subgraph Search on HCD
  • Experimental Results
  • Conclusion

27

28 of 46

Existing Works of Subgraph Search

Limitations of BKS [1] (time- and space-optimal serial solution)

    • Compute in decreasing coreness, rely on the results of larger coreness
    • Preprocessing in BKS is inefficient to parallelize

  • Our Solution
  • Vertex-centric solution (no dependency)
  • A novel preprocessing
  • Efficient to parallelize

28

[1] D. Chu, F. Zhang, X. Lin, W. Zhang, Y. Zhang, Y. Xia, and C. Zhang, “Finding the best k in core decomposition: A time and space optimal solution,” in ICDE. IEEE, 2020, pp. 685–696.

29 of 46

Our Solution

  1. Compute HCD
  2. Preprocessing
  3. Score Computation on HCD
  4. Output the Best K-Core

29

30 of 46

Our Solution

  1. Compute HCD
  2. Preprocessing
  3. Score Computation on HCD
  4. Output the Best K-Core

30

 

31 of 46

3. Score Computation on HCD (Intuition)

 

 

32 of 46

3. Type-A Score Computation on HCD

  • E.g. Average Degree, #vertices, #edges

32

5v, 8e

4v, 6e

9v, 14e

9v, 9e

5v, 8e

4v, 6e

18v, 28e

27v, 37e

 

 

i. Vertex-centric

contribution

……

……

ii. Tree node contribution

Aggregate

iii. Primary value of k-cores

Bottom-Up

Summation

Compute

is parallel

3.2

3

3.11

2.74

iv. Score of k-cores

Score

33 of 46

3. Type-B Score Computation on HCD

  • Similar but more complex compared with type-A
  • Triangle: use lower-degree endpoint to check triangle existence
  • Triplet: count triplets based on the coreness of neighbors

33

34 of 46

Our Solution

  1. Compute HCD
  2. Preprocessing
  3. Score Computation on HCD
  4. Output the Best K-Core

34

The highest score

35 of 46

PBKS Time Analysis

  •  

35

 

36 of 46

Roadmap

  • Problem Definition
  • Parallel Construction of HCD
  • Parallel Subgraph Search on HCD
  • Experimental Results
    • Experimental Setting
    • Runtime Performance
    • Application Performance
  • Conclusion

36

37 of 46

Dataset Statistics & Experimental Setting

  • 10 Public Networks, Up to 100M nodes & 4B edges
  • A Quad-Core (up to 40 threads) Linux server with 128G memory

37

38 of 46

Parallel HCD Construction Time

  • Serial PHCD is 1.24-2.33x faster than LCPS
  • 40-cores
    • Small graphs: 10x
    • Large graphs: 15-20x

Significantly faster than LCPS

38

Increasing network size

39 of 46

Parallel HCD Construction Time

  • LB: the lower-bound cost of the UF-based method
    • LB is ~0.5 the cost of our method

Our PHCD is close to lower bound

39

40 of 46

Parallel HCD Construction Time

  • LB: the lower-bound cost of the UF-based method
    • LB is ~0.5 the cost of our method

Our PHCD is close to lower bound

  • RC is required in Divide and Conquer method

D&C paradigm is inefficient in building HCD

40

41 of 46

Parallel Subgraph Search Time

  • Score Computation (40-cores)
  • Type-A: Small 30-45x� Large 50x
  • Type-B: ~20x

Significantly faster than BKS

41

Increasing network size

42 of 46

Application: Densest Subgraph

42

[1] Fang, Yixiang, et al. Efficient algorithms for densest subgraph discovery. PVLDB 2019.

[2] D. Chu, F. Zhang, X. Lin, W. Zhang, Y. Zhang, Y. Xia, and C. Zhang, “Finding the best k in core decomposition: A time and space optimal solution,” in ICDE. IEEE, 2020, pp. 685–696.

 

 

Better is bold

 

43 of 46

Application: Maximum Clique

43

 

 

 

 

44 of 46

Roadmap

  • Problem Definition
  • Parallel Construction of HCD
  • Parallel Subgraph Search on HCD
  • Experimental Results
  • Conclusion

44

45 of 46

Conclusion

  •  

45

46 of 46

Q & A

46