1 of 66

Advancements in �Knowledge Graph Reasoning

Innovative Approaches to Complex Logical Query Answering�and Logical Hypothesis Generation

Jiaxin Bai

KnowComp, Department of CSE�The Hong Kong University of Science and Technology

Jiaxin Bai, KnowComp, HKUST

1

4/8/2025

2 of 66

Roadmap

  • Background
  • Complex Query Answering
  • Complex Hypothesis Generation
  • Complex Session Intension Understanding
  • Future Work

Jiaxin Bai, KnowComp, HKUST

2

4/8/2025

3 of 66

Limitations of Current AI Systems

  • LLMs struggle with hallucinations in complex reasoning

Jiaxin Bai, KnowComp, HKUST

3

4/8/2025

4 of 66

Limitations of Current AI Systems

  • Vector DBs focus on document similarity, not logical queries
  • Need for deeper inference and structured knowledge

Jiaxin Bai, KnowComp, HKUST

4

4/8/2025

Need Grounded Reasoning on Structured Data!

For example a knowledge graph

5 of 66

Structured Knowledge: Knowledge Graphs

G consisting of:

  • Entities (nodes) E: objects, concepts, or ideas
  • Relation types (edge labels) R: associations between entities
  • Triples (eh, r, et) ∈ G: head and tail entities connected by a relation

5

4/8/2025

6 of 66

How to use KG for Reasoning?

  • How to answer knowledge graph queries on different types of knowledge graphs given the graphs are always incomplete?

Complex Query Answering

  • How to obtain logical hypothesis from observations by using structured knowledge?

Logical Hypothesis Generation

  • How to apply the KG reasoning to real e-commerce application like search and recommendation?

Complex Session Intension Understanding

Jiaxin Bai, KnowComp, HKUST

6

4/8/2025

7 of 66

Roadmap

  • Background
  • Complex Query Answering
  • Complex Hypothesis Generation
  • Complex Session Intension Understanding
  • Future Work

Jiaxin Bai, KnowComp, HKUST

7

4/8/2025

8 of 66

Complex Query Answering

How do we deal with the incompleteness of KG?

How do we scale up to large knowledge graphs and long queries?

8

Complex Queries

Interpretations

Find where the Canadian Turing award laureates graduated from.

Find the substances that interact with the proteins associated with diseases T1, T2, or T3.

Find entities, who are Germans, were the Nobel Prize winners and eventually moved to the United States.

9 of 66

Query Encoding

How do we deal with the incompleteness of KG?

Use embeddings to represent queries and subqueries

How do we scale up to large knowledge graphs and long queries?

Using computation to replace graph search / subgraph matching

Only a single approximate nearest neighbor search in inference

9

Turing

Award

Canada

HasWinner

Intersection

HasCitizen

HasWinner

10 of 66

Neural Query Encoders

10

Models

Encoding Structures

GQE [1]

Vector Embedding

Query2Box [2]

Box Embedding

Query2Particles [3]

Multiple Vectors Embedding

FuzzQE [4]

Fuzzy Logic Embedding

Neural MLP [5]

Vector Embedding

NewLook [6]

Vector Embedding

Figures from [1]

Figures from [2]

Figures from [3]

11 of 66

Query2Particles: Knowledge Graph Reasoning with Particle Embeddings

Jiaxin Bai, Zihao Wang, Hongming Zhang, Yangqiu Song

NAACL-2022 (Findings)

Jiaxin Bai, KnowComp, HKUST

11

4/8/2025

12 of 66

Embedding Space and Set Representations

Jiaxin Bai, KnowComp, HKUST

12

4/8/2025

 

Turing

Award

Canada

Has Winner

Has Citizen

Complement

Intersection

Graduate

Computation Graph

Embedding Space

The multi-hop logical operations make the query answers diversified

The answers embeddings are set(s) scattered in the embedding space

Vector Embeddings

Box Embeddings

Particle Embeddings

Example from: Jiaxin Bai, Zihao Wang, Hongming Zhang, Yangqiu Song: Query2Particles: Knowledge Graph Reasoning with Particle Embeddings. NAACL-HLT (Findings) 2022: 2703-2714

13 of 66

Relational Projection

  •  

Jiaxin Bai, KnowComp, HKUST

13

4/8/2025

 

 

 

Relational Projection

14 of 66

Intersection, Union, and Negation

  •  

Jiaxin Bai, KnowComp, HKUST

14

4/8/2025

 

 

 

 

 

Intersection

 

 

 

Complement

 

 

 

Union

15 of 66

Training Query2Particles

  •  

Jiaxin Bai, KnowComp, HKUST

15

4/8/2025

16 of 66

Dataset

Jiaxin Bai, KnowComp, HKUST

16

4/8/2025

The basic information about the three knowledge graphs used for the experiments.

The detailed information for the queries used for training, validating, and testing all query embedding methods.

17 of 66

Training Query2Particles

Jiaxin Bai, KnowComp, HKUST

17

4/8/2025

2p

1p

3p

2i

3i

2in

3in

inp

pni

pin

ip

pi

2u

up

n

n

n

n

n

u

u

In-distribution types

Out-of-distribution types

p: projection

i: intersection

n: negation

u: union

In-distribution: used for training and evaluation

Out-of-distribution: no training, evaluation only

18 of 66

Comparison with baselines

  • There is no need to pre-process the queries to their disjunction-normal forms (DNF), which takes exponential complexity
  • Fast inference speed compared to the query decomposition method

18

19 of 66

Queries with Diverse Answers

19

Models

1P

2I

2U

2IN

Average

Q2P-1P

44.8

28.8

11.3

15.0

25.0

Q2P-2P

49.4

35.5

13.3

20.7

29.7

Q2P-3P

53.0

37.7

18.6

21.6

32.8

MRR on the top ten percent diversified queries

Diversity is measured by the number of answers.

Significantly better on the queries with diverse answers

20 of 66

Sequential Query Encoding For Complex Query Answering on Knowledge Graphs

Jiaxin Bai*, Tianshi Zheng*, Yangqiu Song

Transactions of Machine Learning Research

Jiaxin Bai, KnowComp, HKUST

20

4/8/2025

21 of 66

Neural Networks as Operators

Jiaxin Bai, KnowComp, HKUST

21

4/8/2025

Interact

Mad Cow Disease

Alzheimer’s Disease

Union

 

Assoc

Answers

Assoc

From a query to a computation graph

Do we have to parameterize and then execute such computational graph?

22 of 66

From Computing to Encoding

  • Previous: neural networks as operators, then execute the graph

  • SQE: tokenize & encode the whole query graph by a single NN

Jiaxin Bai, KnowComp, HKUST

22

4/8/2025

[(] [P] [Interact]

[(] [U]

[(] [P] [Assoc] [MadCow] [)]

[(] [P] [Assoc] [Alzheimer ] [)]

[)]

[)]

Interact

Mad Cow Disease

Alzheimer’s Disease

Union

Assoc

Answers

Assoc

Tokenization

23 of 66

Sequential Query Encoding

Jiaxin Bai, KnowComp, HKUST

23

4/8/2025

Sequence Encoder

[(] [P] [Interact] [(] [U] [(] [P] [Assoc] [MadCow] [)] [(] [P] [Assoc] [Alzheimer] [)] [)] [)]

[Melanin]

E[(]

E[P]

E[(]

C

E[U]

E[(]

E[Interact]

E[P]

E[Assoc]

E[MadCow]

E[)]

E[(]

E[P]

E[Assoc]

E[Alzheimer]

E[)]

E[)]

E[)]

Sequence Encoder: Transformers, LSTMs, Temporal CNN…

24 of 66

Experiments - Benchmarks

Jiaxin Bai, KnowComp, HKUST

24

4/8/2025

Dataset

In-distribution Types

Out-of-distribution Types

Total

Query2Box

5

4

9

BetaE

10

4

14

SMORE

10

4

14

This Paper (SQE)

29

29

58

We construct a larger benchmark with diverse query types.

25 of 66

Results

Jiaxin Bai, KnowComp, HKUST

25

4/8/2025

Datasets

Models

In-distribution Queries

Out-of-distribution Queries

Entailment

Inference

Entailment

Inference

FB15K-237

ConE

36.69

9.75

28.13

8.82

BetaE

32.48

8.30

22.96

7.29

Q2P

52.33

10.17

32.70

8.62

Neural MLP

51.09

10.03

36.85

8.75

+ MLP Mixer

45.19

10.07

33.03

8.66

SQE + CNN

52.09

10.14

28.21

7.65

SQE + GRU

55.46

10.59

32.25

8.34

SQE + LSTM

56.02

10.62

33.41

8.62

SQE + Transformer

59.15

11.30

15.06

4.98

The operator-level parametrization is good for compositional generalization

for unseen query types

Sequential query encoding is good for the queries when its query types are seen

26 of 66

Knowledge Graph Reasoning over Entities and Numerical Values

Jiaxin Bai, Chen Luo, Zheng Li, Qingyu Yin, Bing Yin, Yangqiu Song

KDD-2023

Jiaxin Bai, KnowComp, HKUST

26

4/8/2025

27 of 66

Numerical Complex Query Answering

Jiaxin Bai, KnowComp, HKUST

27

4/8/2025

Interpretations

Complex Queries

Category

1927.

Find the Turing award winners

that

is born before

the year of

Numerical CQA

Find the states in US that have

a

higher latitudes

than Beijing.

Numerical CQA

Find the states in US that have a

twice smaller population

than

California?

Numerical CQA

28 of 66

Number Reasoning Network

Jiaxin Bai, KnowComp, HKUST

28

4/8/2025

 

 

 

Find the cities that have a higher latitudes than Japanese cities.

 

 

 

 

 

 

 

(1) Relational

Projection

(2) Attribute

Projection

(3) Numerical

Projection

(4) Reverse

Attribute Projection

50°

30°

40°

20°

50°

30°

40°

20°

 

Jiaxin Bai, Chen Luo, Zheng Li, Qingyu Yin, Bing Yin, Yangqiu Song: Knowledge Graph Reasoning over Entities and Numerical Values. KDD 2023: 57-68

29 of 66

Main Results on Three KGs

Jiaxin Bai, KnowComp, HKUST

29

4/8/2025

Query Encoding

Attribute

Hit@1

Hit@3

Hit@10

MRR

GQE

Baseline

10.33

18.19

27.91

16.29

NRN + DICE

11.03

19.18

29.01

17.15

NRN + Sinusoidal

11.14

19.39

29.23

17.31

Q2P

Baseline

10.22

17.35

26.61

15.81

NRN + DICE

11.86

19.70

29.46

17.84

NRN + Sinusoidal

12.25

20.16

29.96

18.28

Q2B

Baseline

11.81

20.93

31.19

18.41

NRN + DICE

12.52

22.09

32.34

19.34

NRN + Sinusoidal

12.75

22.22

32.46

19.51

30 of 66

Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints

Jiaxin Bai, Xin Liu, Weiqi Wang, Chen Luo, Yangqiu Song

NeurIPS-2023

Jiaxin Bai, KnowComp, HKUST

30

4/8/2025

31 of 66

31

ASER (Activities, States, Events, and their Relations)

https://github.com/HKUST-KnowComp/ASER

Hongming Zhang, Xin Liu, Haojie Pan, Yangqiu Song, Cane Wing-Ki Leung: ASER: A Large-scale Eventuality Knowledge Graph. WWW 2020: 201-211

Katz, J. J., & Fodor, J. A. (1963). The structure of a semantic theory. Language, 39(2), 170–210.

Yorick Wilks. 1975. An intelligent analyzer and understander of English. Communications of the ACM, 18(5):264–274.

Principle 1: Comparing semantic meanings by fixing grammar (Katz and Fodor, 1963)

Principle 2: The need of language inference based on ‘partial information’ (Wilks, 1975)

32 of 66

CQA on Eventuality Knowledge Graph

Jiaxin Bai, KnowComp, HKUST

32

4/8/2025

Complex query on eventuality graphs are different from the entity-relation graph

Whether and when the eventualities occur are important

Interpretations

Type

Queries

Find the substances that interact with the

proteins associated with Alzheimer’s and

Mad cow disease.

Entity

Instead

of buying an umbrella,

PersonX

go

home.

What happened before

PersonX

go

home?

Eventuality

Food

is bad before

PersonX

add soy sauce.

What is the reason for food being bad?

Eventuality

Jiaxin Bai, Xin Liu, Weiqi Wang, Chen Luo, Yangqiu Song: Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints. NeurIPS, 2023

33 of 66

Query Encoding with Constraint Memory

Jiaxin Bai, KnowComp, HKUST

33

4/8/2025

V?

PersonX

complains

PersonX leaves

restaurant

Succession

PersonY

adds ketchup

Computational Graph

Constraint Memory

Food is bad

Precedence

ChosenAlter.

PersonY

adds vinegar

PersonY

adds soy sauce

Key

Value

Succession

Intersection

Reason

(1)

(2)

 

 

(3)

 

Jiaxin Bai, Xin Liu, Weiqi Wang, Chen Luo, Yangqiu Song: Complex Query Answering on Eventuality Knowledge Graph with Implicit Logical Constraints. NeurIPS, 2023

34 of 66

The MEQE Combined with Various QE methods

Jiaxin Bai, KnowComp, HKUST

34

4/8/2025

Models

Occurrence Constraints

Temporal Constraints

Average

Hit@1

Hit@3

MRR

Hit@1

Hit@3

MRR

Hit@1

Hit@3

MRR

GQE

8.92

14.21

13.09

9.09

14.03

12.94

9.12

14.12

13.02

+ MEQE

10.20

15.54

14.31

10.70

15.67

14.50

10.45

15.60

14.41

Q2P

14.14

19.97

18.84

14.48

19.69

18.68

14.31

19.83

18.76

+ MEQE

15.15

20.67

19.38

16.06

20.82

19.74

15.61

20.74

19.56

Nerual MLP

13.03

19.21

17.75

13.45

19.06

17.68

13.24

19.14

17.71

+ MEQE

15.26

20.69

19.32

15.91

20.63

19.47

15.58

20.66

19.40

FuzzQE

11.68

18.64

17.07

11.68

17.97

16.53

11.68

18.31

16.80

+ MEQE

14.76

21.12

19.45

15.31

21.01

19.49

15.03

21.06

19.47

35 of 66

Roadmap

  • Background
  • Complex Query Answering
  • Complex Hypothesis Generation
  • Complex Session Intension Understanding
  • Future Work

Jiaxin Bai, KnowComp, HKUST

35

4/8/2025

36 of 66

Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation

Jiaxin Bai*, Yicheng Wang*, Tianshi Zheng, Yue Guo, Xin Liu, Yangqiu Song

ACL-2024

Jiaxin Bai, KnowComp, HKUST

36

4/8/2025

37 of 66

Abductive Reasoning

  • Abductive reasoning is a form of reasoning that seeks to find the best explanation for an observation, distinct from the other two major types of inference: deduction and induction.

  • Abductive Reasoning on KG:

To use structured knowledge in KG to explain observations.

Jiaxin Bai, KnowComp, HKUST

37

4/8/2025

38 of 66

Abductive Reasoning

  • Abductive Reasoning on KG:

To use structured knowledge in KG to explain observations.

Jiaxin Bai, KnowComp, HKUST

38

4/8/2025

Observations (O)

Hypotheses (H)

Hypotheses Interpretations

The actors and screenwriters born in Los Angeles

The Apple products released in 2010 that are not phones

The disease whose symptoms can be relieved by Panadol

39 of 66

Tokenization of Hypothesis

Jiaxin Bai, KnowComp, HKUST

39

4/8/2025

1

2

7

3

5

8

4

6

9

[Apple]

[2010]

[Phone]

[Type]

[Release]

[Brand]

[I]

[I]

[N]

[I]

[I]

 

Tokens : [I][I][Brand][Apple]

[Release][2010][N][Type][Phone]

40 of 66

Complex Logical Hypothesis Generation

Jiaxin Bai, KnowComp, HKUST

40

4/8/2025

Step 1:

Sample observation-hypothesis pairs.

Observations

Hypotheses

KG:

Step 2:

Train hypothesis generation model by using teacher forcing.

Hypothesis Generation Model

Observations

Generated Hypotheses

 

41 of 66

Complex Logical Hypothesis Generation

Jiaxin Bai, KnowComp, HKUST

41

4/8/2025

Observation

Generated Hypothesis

KG

Hypothesis Conclusion

Jaccard

PPO Training

Policy Gradient Optimization

Model

Reference Model

Log-probabilities

Log-probabilities

KL-Div

Step 3:

Optimize hypothesis generation model with Reinforcement Learning From Knowledge Graph feedback (RLF-KG).

 

 

42 of 66

Dataset

Jiaxin Bai, KnowComp, HKUST

42

4/8/2025

This figure provides basic information about the three knowledge graphs utilized in our experiments. The graphs are divided into standard sets of training, validation, and testing edges to facilitate the evaluation process.

The detailed information about the queries used for training, validation, and testing.

43 of 66

Performance

Jiaxin Bai, KnowComp, HKUST

43

4/8/2025

Dataset

Model

1p

2p

2i

3i

ip

pi

2u

up

2in

3in

pni

pin

inp

Ave.

FB15k-237

Enc.-Dec.

0.626

0.617

0.551

0.513

0.576

0.493

0.818

0.613

0.532

0.451

0.499

0.529

0.533

0.565

+ RLF-KG

0.855

0.711

0.661

0.595

0.715

0.608

0.776

0.698

0.670

0.530

0.617

0.590

0.637

0.666

Dec.-Only

0.666

0.643

0.593

0.554

0.612

0.533

0.807

0.638

0.588

0.503

0.549

0.559

0.564

0.601

+ RLF-KG

0.789

0.681

0.656

0.605

0.683

0.600

0.817

0.672

0.672

0.560

0.627

0.596

0.626

0.660

44 of 66

Real Examples:

Jiaxin Bai, KnowComp, HKUST

44

4/8/2025

45 of 66

Jiaxin Bai, KnowComp, HKUST

45

4/8/2025

Output from Supervised Training:

Output after RLF-KG Training:

46 of 66

Roadmap

  • Background
  • Complex Query Answering
  • Complex Hypothesis Generation
  • Complex Session Intension Understanding
  • Future Work

Jiaxin Bai, KnowComp, HKUST

46

4/8/2025

47 of 66

Understanding Inter-Session Intentions via Complex Logical Reasoning

Jiaxin Bai, Chen Luo, Zheng Li, Qingyu Yin, Yangqiu Song

KDD-2024

Jiaxin Bai, KnowComp, HKUST

47

4/8/2025

48 of 66

Search with Logic is Hard

  • Search products with complex requirements is hard.
  • Particularly hard for requirements like NOT.

48

4/8/2025

49 of 66

Search with hidden intentions from sessions

  • Intentions can be hidden and implicit, not always from keywords.
    • We need to use behaviour data, like session information, to model intentions.

  • Often people do a single decision across multiple sessions.
    • People may search and spend multiple sessions before making a big decision ,like buying a smartphone or a fridge.
    • If a person buy a mattress in a previous session, he is not likely to buy another one, but a bed frame.

  • We need a complex logic reasoning across sessions.
    • Find the desired item of a session with the brand Nike or Adidas
    • Find the desired item with wooden material that is desired by the session1 but is not desired by session2.
  • Dealing with sessions, attributes, and logics in one.

49

4/8/2025

50 of 66

Integrating sessions, attributes, and logics

For Product Recommendation

50

4/8/2025

Find next item of a given session

 

Find an item with the brand Nike

Find an item with the brand Adidas or Nike

Or

Find an item with the brand Adidas

Find the next item of a session with the brand Nike or Adidas

And

Find the next item of a session with the brand Nike or Adidas

51 of 66

Logical Session-CQA on Hypergraph

  • Sessions are treated as hyperedges.
  • Other attributes and relations are treated as binary edges.

51

4/8/2025

(A) Hypergraph

(B) Hyper-Relational KG

Discovered By

Albert

Einstein

Photoelectric

Effect

Educated At

Degree: BSc

ETH

Zurich

(C) Hyper Session Graph

Item1 Item2 Item3 Item4

Session1

Red

Blue

Nike

Adidas

Brand

Colour

Brand

Colour

Brand

Session2

Hyperedge1

Hyperedge2

52 of 66

CQA Methods for Inter-Session Logic Reasoning

52

N-ary QE methods

StarQE [1] and NQE [2] are designed for hyper-relational KG, difficult to be adopted to session graphs.

SQE [3] can be extended to N-ary quires, but it cannot capture some important aspects of query graph, like permutation invariance in AND and OR.

Session Encoders + Logic Encoders

When doing logic reasoning, the logic encoder can only access session embedding but not the detailed items in the session in the reasoning process.

Brand

Intersection

Next

Next

[(] [P] [Brand]

[(] [I]

[(] [P] [Next] [(] [S] [Item1,1] … [Item1,m] [)] [)]

[(] [P] [Next] [(] [S] [Item2,1] … [Item2,n-1] [Item2,n] [)] [)]

[)]

[)]

Need a new query encoding method on session hypergraph

[1] Dimitrios Alivanistos, Max Berrendorf, Michael Cochez, Mikhail Galkin: Query Embedding on Hyper-Relational Knowledge Graphs. ICLR 2022

[2] Haoran Luo, Haihong E, Yuhao Yang, Gengxian Zhou, Yikai Guo, Tianyu Yao, Zichen Tang, Xueyuan Lin, Kaiyang Wan: NQE: N-ary Query Embedding for Complex Query Answering over Hyper-Relational Knowledge Graphs. AAAI 2023

[3] Jiaxin Bai, Tianshi Zheng, Yangqiu Song: Sequential Query Encoding for Complex Query Answering on Knowledge Graphs. Trans. Mach. Learn. Res. 2023 (2023)

53 of 66

Logical-Session Graph Transformer

  • vi are product items
  • Si are sessions
  • Pi are projection operators
  • Ii are intersection operators
  • Ui are union operators
  • Ni are negation operators

53

4/8/2025

Brand

Intersection

Next

Next

S2

S1

I1

P2

P1

P3

v1

v2

v3

v2

v3

v4

Query Graph

 

Find the item that is desired by the session1 and desired by session2.

54 of 66

Logical-Session Graph Transformer

54

4/8/2025

Brand

Intersection

Next

Next

S2

S1

I1

P2

P1

P3

v1

v2

v3

v2

v3

v4

Query Graph

Node identifiers (from 0 to 9) are assigned to each node, involving items, sessions, and operators in the graph

v1

v2

v3

v4

S1

S2

P1

P2

P3

I1

1

2

3

4

5

6

7

8

9

0

55 of 66

Logical-Session Graph Transformer

55

4/8/2025

Transformer Encoder

[I]

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

[g]

[v1]

[v2]

[v3]

[v4]

[S]

[S]

[P]

[P]

[P]

v

v

v

v

v

v

v

v

v

v

Items and Operators

[3]

[2]

1

2

0

4

4

4

1

2

3

5

5

5

[1]

[3]

[2]

[1]

e

e

e

e

e

e

Sessions Structures

4

6

5

7

6

9

7

9

9

8

[Next]

[Next]

[I]

[I]

[Brand]

e

e

e

e

e

Logical Structures

[Predictions]

Type

Identifiers:

v

[node]:

[edge]:

e

Node

Identifiers:

v1

v2

v3

v4

S1

S2

P1

P2

P3

I1

1

2

3

4

5

6

7

8

9

0

Brand

Intersection

Next

Next

S2

S1

I1

P2

P1

P3

v1

v2

v3

v2

v3

v4

56 of 66

Logical-Session Graph Transformer

  • Permutation Invariant to Logical Operators.
  • Sensitive to Item-order in Session.
  • Specialised Token Graph Transformer [1] for encoding query graph

56

4/8/2025

Brand

Intersection

Next

Next

Transformer Encoder

S2

S1

I1

P2

P1

P3

v1

v2

v3

v2

v3

v4

Node

Identifiers:

v1

v2

v3

v4

S1

S2

P1

P2

P3

I1

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

7

8

9

0

[2]

1

2

0

4

4

4

1

2

3

5

5

5

4

6

5

7

6

9

7

9

9

8

[g]

[v1]

[v2]

[v3]

[v4]

[S]

[S]

[P]

[P]

[P]

[I]

[1]

[3]

[2]

[1]

[3]

[Next]

[Next]

[I]

[I]

[Brand]

v

v

v

v

v

v

v

v

v

v

e

e

e

e

e

e

e

e

e

e

e

Items and Operators

Sessions Structures

Logical Structures

[Predictions]

Type

Identifiers:

v

[node]:

[edge]:

e

(A)

(B)

(C)

[1] Jinwoo Kim, Dat Nguyen, Seonwoo Min, Sungjun Cho, Moontae Lee, Honglak Lee, Seunghoon Hong: Pure Transformers are Powerful Graph Learners. NeurIPS 2022

57 of 66

Experiment Dataset

  • Diginetica Dataset
    • The Diginetica dataset is commonly used in session-based recommendation research. It originates from the CIKM Cup 2016, focusing on e-commerce applications. The dataset contains user sessions, item interactions, and transaction information.
  • Dressipi Dataset
    • The Dressipi dataset is a fashion-oriented dataset designed for personalized recommendation systems. It comes from the clothing and fashion domain, making it relevant for session-based recommendations in retail.
  • Amazon M2 Dataset
    • The Amazon M2 dataset (KDD Cup 2023) focuses specifically on user sessions and interactions with products, making it suitable for session-based and sequential recommendation tasks. The dataset is often used in research on e-commerce platforms to enhance recommendations by modeling user behavior over time.

57

4/8/2025

58 of 66

Experiment Dataset

58

4/8/2025

Train Graph

Validation Graph

Test Graph

Dataset

Vertices

Edges

Vertices

Edges

Vertices

Edges

#Sessions

#Items

#Values

#Relations

Amazon

2,258,179

7,234,680

2,345,475

7,620,527

2,431,747

8,004,984

720,816

431,036

1,279,895

10

Diginetica

257,018

1,286,384

261,996

1,337,628

266,897

1,387,861

12,047

134,904

125,204

3

Dressipi

611,520

2,435,932

643,140

2,567,128

674,853

2,698,692

668,650

23,618

903

74

Train Queries

Validation Queries

Test Queries

Dataset

Item-Attributes

Others

All Types

All Types

Amazon

2,535,506

720,816

36,041

36,041

Diginetica

249,562

60,235

3,012

3,012

Dressipi

414,083

668,650

33,433

33,433

59 of 66

59

4/8/2025

1p

2p

3i

2iA

2iS

ip

pi

2uS

up

2inS

2inA

inp

pin

3in

3iA

u

u

n

n

n

n

n

3ip

3inA

n

3inp

n

Zero-shot Types:

Supervised Training Types:

60 of 66

Experiment Results

60

4/8/2025

Dataset

Query Encoder

Session Encoder

Average-EPFO

Average-Negation

Amazon

FuzzQE

GRURec

30.94

21.03

SRGNN

31.75

22.26

Attn-Mixer

31.68

25.00

Q2P

GRURec

23.09

14.61

SRGNN

24.59

16.62

Attn-Mixer

28.16

26.95

NQE

-

23.19

18.12

SQE-Transformer

-

30.07

27.16

SQE-LSTM

-

32.53

27.13

LSGT (Ours)

33.26

29.69

Existential Positive First Order (EPFO): queries types that involves conjunction, disjunction, and variables are existential quantified. No negations.

61 of 66

Out-of-Distribution Queries Results

61

4/8/2025

Dataset

Query Encoder

3iA

3ip

3inA

3inp

Average OOD

Amazon

FuzzQE + Attn-Mixer

66.72

29.67

54.33

48.76

49.87

Q2P + Attn-Mixer

33.51

11.42

51.47

41.46

34.47

NQE

61.72

1.98

46.47

34,04

36.72

SQE + Transformers

66.03

28.41

55.61

51.28

50.33

LSGT (Ours)

68.44

34.22

58.50

51.49

53.16

Diginetica

FuzzQE + Attn-Mixer

88.30

32.88

82.75

34.50

59.61

Q2P + Attn-Mixer

40.28

43.93

54.31

48.20

46.68

NQE

86.25

20.79

64.74

20.93

48.18

SQE + Transformers

88.05

31.33

81.77

35.83

59.25

LSGT (Ours)

91.71

35.24

83.30

41.05

62.83

Dressipi

FuzzQE + Attn-Mixer

65.43

95.64

53.36

97.75

78.05

Q2P + Attn-Mixer

60.64

96.78

52.22

97.28

76.73

NQE

31.96

96.18

9.89

97.80

58.96

SQE + Transformers

72.61

97.12

55.20

98.14

80.77

LSGT (Ours)

74.34

97.30

58.30

98.23

82.04

OOD query types: queries types that are not trained during the training phrase.

As their sub-queries are trained, we can use this as a measure of compositional generalization.

62 of 66

Roadmap

  • Background
  • Complex Query Answering
  • Complex Hypothesis Generation
  • Session Intension Modeling by Logic
  • Future Work

Jiaxin Bai, KnowComp, HKUST

62

4/8/2025

63 of 66

Future Work: Neural Graph Database

Jiaxin Bai, KnowComp, HKUST

63

4/8/2025

Text Data

Database Data

Neural Graph Database

🡪 Database Foundation Model

Large Language Model

[1] https://youtu.be/1yvBqasHLZs

[2] Wang, Y., Wang, X., Gan, Q., Wang, M., Yang, Q., Wipf, D., & Zhang, M. (2025). Griffin: Towards a Graph-Centric Relational Database Foundation Model. arXiv preprint arXiv:2505.05568.

[2]

Ilya Sutskever [1]

LLM Pre-training scaling will stop

because we only have one internet!

  • Data within databases continues to grow and won't be easily depleted;
  • At very early stage where universities can do research [2];
  • Combines general applicability to real-world applications, positioning it to make industry impact.

Neural Graph Database (NGDB) show great potential to scale to Database Foundation Model!

64 of 66

Future Work – Agentic Database

  • Ask the “correct” question on behalf of the user based on a context
    • The future direction for abductive reasoning and beyond
  • Conduct approximation and inferences given various typed data
    • The future direction for complex logical query answering
  • Integrate results on behalf of the user and present to them
    • The future direction for KG reasoning together with the LLMs

Jiaxin Bai, KnowComp, HKUST

64

4/8/2025

65 of 66

Bengio

LeCun

Knuth

1964

1947

1938

UofT

Toronto

Stanford

Montreal

New York

Hinton

Turing Award

Private edge

Public edge

Neural Graph Databases

Embedding Storage

Query Engine

Embedding

Language Modeling via Next Token Prediction:

We would like to invite esteemed senior professors who have made significant contributions to computer science to give a talk on-site. To accommodate them, we have decided to hold the seminar in ________

Task 1

Task 1: Generating Good NGDB Queries.

Task 2

Task 2: Improving NGDB Reasoning.

…… we have decided to hold the seminar in __Toronto__.

Task 3: Incorporating NGDB Results.

Task 3

 

Neural Graph Database Reasoning:

Where are the Turing Award Winners born before 1940 lives in?

Task 4: Application of Agentic NGDB

Task 4

Large Language Model

66 of 66

Thank you for your attention! ☺�More related work on my website:�bjx.fun

66