1 of 22

In-depth Analysis of Continuous Subgraph Matching in a Common Delta Query Compilation Framework

June, 2024

Yukyoung Lee, Kyoungmin Kim, Wonseok Lee, Wook-Shin Han

POSTECH

Data Systems Lab@POSTECH

2 of 22

Subgraph Matching (SM)

2

A

B

C

query graph q

 

 

B

C

B

A

 

 

 

 

Data Systems Lab@POSTECH

3 of 22

Continuous Subgraph Matching (CSM)

3

A

B

C

query graph q

 

 

 

 

positive match

for edge insertion

negative match

for edge deletion

Updates for the data graph

B

C

B

A

 

 

 

 

B

C

A

B

 

 

 

 

B

B

A

C

 

 

 

 

Data Systems Lab@POSTECH

4 of 22

Continuous Subgraph Matching (CSM)

4

A

B

C

query graph q

 

 

 

 

positive match

for edge insertion

negative match

for edge deletion

Updates for the data graph

B

C

B

A

 

 

 

 

B

C

A

B

 

 

 

 

B

B

A

C

 

 

 

 

Find all positive/negative matches (i.e., delta of matches) for each update operation

Data Systems Lab@POSTECH

5 of 22

Applications of CSM

5

Fraud Detection

Cyber Security

Emergency Response

Social Recommendation

Data Systems Lab@POSTECH

6 of 22

Existing CSM Methods

6

Join-based

(relational aspect)

Exploration-based

(graph aspect)

Maintain

intermediate results

SJ-Tree

TurboFlux

SymBi

RapidFlow

CaLiG

Not maintain any intermediate results

Graphflow

-

Extend partial matches by executing join

Extend partial matches by traversing adjacency list

Partial matches

Compressed partial matches

(e.g., graph, candidate vertex set)

Data Systems Lab@POSTECH

7 of 22

Limitations of Existing Work

  • Unfair comparisons
    • The comparisons in the original papers of existing CSM methods are unfair.
    • Lately, a unified framework for CSM has been proposed, but even in this framework, several claims in their experiments are misleading.

  • All of them are hard to conduct in-depth analysis because they do not support inserting and deleting specific techniques.

7

Data Systems Lab@POSTECH

8 of 22

Our Approach

  • Goal: Fair and in-depth comparisons of CSM methods

  • The core of our approach: Use a common lens of incremental view maintenance (IVM)

8

Technique in the relational database to maintain a view V incrementally

(i.e., calculate ∆V using delta query, and then apply it to V)

Data Systems Lab@POSTECH

9 of 22

SM Query as a Join Query

9

A

B

C

query graph q

 

 

B

C

B

A

 

 

 

 

 

 

 

 

 

Database D

Query Q

Data Systems Lab@POSTECH

10 of 22

CSM Query as Delta Query in IVM

10

B

C

B

A

B

C

A

B

query graph q

 

 

 

B

B

A

C

Updates for the data graph

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A

B

C

 

positive match

for edge insertion

negative match

for edge deletion

Data Systems Lab@POSTECH

11 of 22

CSM Query as Delta Query in IVM

11

B

C

B

A

B

C

A

B

query graph q

 

 

 

B

B

A

C

Updates for the data graph

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A

B

C

 

positive match

for edge insertion

negative match

for edge deletion

 

 

 

 

 

 

 

Case 1. Positive match

Data Systems Lab@POSTECH

12 of 22

CSM Query as Delta Query in IVM

12

B

C

B

A

B

C

A

B

query graph q

 

 

 

B

B

A

C

Updates for the data graph

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A

B

C

 

positive match

for edge insertion

negative match

for edge deletion

 

 

 

 

 

 

 

Case 2. Negative match

Data Systems Lab@POSTECH

13 of 22

CSM Query as Delta Query in IVM

13

B

C

B

A

B

C

A

B

query graph q

 

 

 

B

B

A

C

Updates for the data graph

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A

B

C

 

positive match

for edge insertion

negative match

for edge deletion

Note that ∆Q just denotes what results to retrieve.

How ∆Q executes is determined by its logical/physical plan!

We also express different CSM methods (i.e., how) into different logical/physical plans of ∆Q.

Data Systems Lab@POSTECH

14 of 22

Advantages of Using Delta Query Plan

  • It enables fair comparisons of CSM methods by
    • revealing clear differences at both logical and physical levels and,
    • using a common physical implementation for the same physical operator.

  • It enables in-depth analysis because we can integrate or remove specific techniques from CSM methods by simply changing subplans.

  • Another advantage stems from leveraging techniques of relational databases such as query plan compilation, which can further improve CSM performance.

14

Data Systems Lab@POSTECH

15 of 22

Challenges

15

Join-based

(rDB aspect)

Exploration-based

(graph aspect)

Maintain

intermediate results

SJ-Tree

TurboFlux

SymBi

RapidFlow

CaLiG

Not maintain any intermediate results

Graphflow

-

Extend partial matches by executing join

Hard to represent their compressed partial matches as algebraic expressions [1]

[1] Mhedhbi, A., et al., “Optimizing one-time and continuous subgraph queries using worst-case optimal joins,” TODS, 2021.

Extend partial matches by traversing adjacency list

Data Systems Lab@POSTECH

16 of 22

Intermediate Results as Materialized View

  • We first interpret compressed partial matches using dynamic programming as a stacked view concept in a relational database.
  • We interpret compressed partial matches as a materialized view for a projection of partial matches on an edge (two attributes) or a vertex (one attribute).

16

Check our paper!

Data Systems Lab@POSTECH

17 of 22

Overview of Our Framework

17

A

B

C

Dimension values

(specification)

Query Graph

 

 

Logical Plan Generation

Physical Plan Generation

Query Plan Compilation

Execution

Offline

Online

 

 

 

 

 

 

 

 

 

 

 

 

Hash table with

key = (B, C),

value = A

 

 

Get all CSM results!

A

B

Inserted edge

 

 

 

Deleted edge

 

 

Specify which materialized views we use, …

A

B

Data Systems Lab@POSTECH

18 of 22

Experiments

  • Algorithms compared
    • SJ-Tree, Graphflow, TurboFlux, and SymBi using their original implementations (Original), previous framework (Sun’22), and our framework (Ours)

  • Benchmarks
    • LSBench (LS) and Netflow (NF) datasets with tree queries (or acyclic) and graph queries (or cyclic) varying query size (or # edges)

  • Comparisons
    • Overall performance
    • Graphflow + SymBi techniques

18

Fair comparison

In-depth analysis

Data Systems Lab@POSTECH

19 of 22

Overall Performance (LS)

19

Tree queries (acyclic)

Graph queries (cyclic)

SymBi

Graphflow

TurboFlux

Data Systems Lab@POSTECH

20 of 22

GraphFlow + SymBi Techniques

20

SymBi

Graphflow

Graphflow + SymBi view

Graphflow +

SymBi view & optimization techniques

966x

Check our paper

for more information!

Data Systems Lab@POSTECH

21 of 22

Conclusion

  • We propose a delta query framework for CSM for fair and in-depth comparisons.
    • Our interpretation using logical and physical plans reveals clear differences in both their logical expressions and physical implementations.
    • Our framework enables the efficient execution of CSM methods by applying modern query plan compilation techniques.
  • Our framework generates CSM code that shows the same trend as the original CSM methods but with better performance due to modern compilation techniques.
  • We make cause-and-effect arguments for the observed performance differences from the previous study and the performance differences between different CSM methods.

21

Data Systems Lab@POSTECH

22 of 22

Thank you!

22

Data Systems Lab@POSTECH