1 of 49

Sequence aware Reinforcement Learning over Knowledge Graphs

Ashish Gupta Rishabh Mehrotra

Sr Statistical Analyst, Walmart Labs Sr Research Scientist, Spotify Research

Bangalore, India London, UK

2 of 49

Let’s first contextualize!

3 of 49

Last day

Last session

Last talk

4 of 49

Knowledge Graphs are useful

Knowledge represented as entities, edges and attributes

  1. Nodes are entities
  2. Nodes are labeled with attributes (e.g., types)
  3. Edges between nodes capture a relationship between entities

5 of 49

Knowledge Graphs are useful

Knowledge represented as entities, edges and attributes

  • Nodes are entities
  • Nodes are labeled with attributes (e.g., types)
  • Edges between nodes capture a relationship between entities

KGs are flexible

easy to integrate heterogeneous information

6 of 49

Large scale KGs in industry!

7 of 49

Recommendations over a KG

  • Embedding models
    1. Learn some kind of user and item representations from KG
    2. Recommendation based on the similarity between user-item entity:
      1. Translational KG Embedding for Rec and Explanation [Ai et al. Alg’2018]
      2. Propagating User Preferences on the Knowledge Graph [Wang et al. CIKM’2018]�

WWW, SIGIR tutorial: ExplainAble Recommendation and Search (EARS)�https://sites.google.com/view/ears-tutorial/

8 of 49

Recommendations over a KG

  • Embedding models
    • Learn some kind of user and item representations from KG
    • Recommendation based on the similarity between user-item entity:
      • Translational KG Embedding for Rec and Explanation [Ai et al. Alg’2018]
      • Propagating User Preferences on the Knowledge Graph [Wang et al. CIKM’2018]�

WWW, SIGIR tutorial: ExplainAble Recommendation and Search (EARS)�https://sites.google.com/view/ears-tutorial/

9 of 49

Recommendations over a KG

  • Embedding models
  • Path based recommendations
    • Learning Path Embedding for Recommendation [Wang et al. AAAI’2019]
    • Jointly Learning Explainable Rules for Recommendation [Ma et al. WWW’2019]
    • Path reasoning beginning from user entity:
      • Reinforcement KG Reasoning for Explainable Recommendation [Xian et al. SIGIR’2019]

WWW, SIGIR tutorial: ExplainAble Recommendation and Search (EARS)�https://sites.google.com/view/ears-tutorial/

10 of 49

Recommendations over a KG

  • Embedding models
  • Path based recommendations
    • Learning Path Embedding for Recommendation [Wang et al. AAAI’2019]
    • Jointly Learning Explainable Rules for Recommendation [Ma et al. WWW’2019]
    • Path reasoning beginning from user entity:
      • Reinforcement KG Reasoning for Explainable Recommendation [Xian et al. SIGIR’2019]

WWW, SIGIR tutorial: ExplainAble Recommendation and Search (EARS)�https://sites.google.com/view/ears-tutorial/

11 of 49

Recommendations over a KG

  • Embedding models
  • Path based recommendations
    • Learning Path Embedding for Recommendation [Wang et al. AAAI’2019]
    • Jointly Learning Explainable Rules for Recommendation [Ma et al. WWW’2019]
    • Path reasoning beginning from user entity:
      • Reinforcement KG Reasoning for Explainable Recommendation [Xian et al. SIGIR’2019]

WWW, SIGIR tutorial: ExplainAble Recommendation and Search (EARS)�https://sites.google.com/view/ears-tutorial/

12 of 49

SeqReLG: Sequence-aware RL over Graphs

13 of 49

Outline

  • Introduction to Knowledge Graphs
  • Product knowledge graphs
  • Background research: Graph Reasoning PGPR (SIGIR 2019)
  • SeqReLG: Sequence aware RL over KG
  • Results
  • Ongoing Work

14 of 49

Product Knowledge Graphs

15 of 49

Product Knowledge Graphs

16 of 49

Paths in a Knowledge Graph

Reasoning path = how the agent reached the item from the user

Xian, Yikun, et al. "Reinforcement Knowledge Graph Reasoning for Explainable Recommendation." SIGIR 2019

17 of 49

Outline

  • Introduction to Knowledge Graphs
  • Product knowledge graphs
  • Background research: Graph Reasoning PGPR (SIGIR 2019)
  • SeqReLG: Sequence aware RL over KG
  • Results
  • Ongoing Work

18 of 49

Knowledge Graph Reasoning (PGPR)

Step 1: Build Knowledge Graph

Xian, Yikun, et al. "Reinforcement Knowledge Graph Reasoning for Explainable Recommendation." SIGIR 2019

19 of 49

Knowledge Graph Reasoning (PGPR)

Step 1: Build Knowledge Graph

Step 2: Learn embeddings

  • train embeddings for each entity & relation
  • For entities pairs (e, e′) with relations rk,j , maximize conditional probability:

Xian, Yikun, et al. "Reinforcement Knowledge Graph Reasoning for Explainable Recommendation." SIGIR 2019

20 of 49

Knowledge Graph Reasoning (PGPR)

Step 1: Build Knowledge Graph

Step 2: Learn embeddings

  • train embeddings for each entity & relation
  • For entities pairs (e, e′) with relations rk,j , maximize conditional probability:
  • Huge size of the entity set → adopt a negative sampling to approximate log P(e′ | e,rk,j ):

Xian, Yikun, et al. "Reinforcement Knowledge Graph Reasoning for Explainable Recommendation." SIGIR 2019

21 of 49

Knowledge Graph Reasoning (PGPR)

Step 1: Build Knowledge Graph

Step 2: Learn embeddings

  • train embeddings for each entity & relation
  • For entities pairs (e, e′) with relations rk,j , maximize conditional probability:
  • Huge size of the entity set → adopt a negative sampling to approximate log P(e′ | e,rk,j )
  • Goal: maximize the objective function:

Xian, Yikun, et al. "Reinforcement Knowledge Graph Reasoning for Explainable Recommendation." SIGIR 2019

22 of 49

Knowledge Graph Reasoning (PGPR)

Step 1: Build Knowledge Graph

Step 2: Learn embeddings

Step 3: Train RL model: policy/value network

  • Train an agent:
    • starts from a user and walks over the graph
    • reach a “good” item node with high probability
  • High reward: reach positive item�Low reward: reach negative item

Xian, Yikun, et al. "Reinforcement Knowledge Graph Reasoning for Explainable Recommendation." SIGIR 2019

23 of 49

Knowledge Graph Reasoning (PGPR)

Step 1: Build Knowledge Graph

Step 2: Learn embeddings

Step 3: Train RL model: policy/value network

Step 4: Serve recommendations using trained policy network

  • For each user:
    • employ beam search guided by the action probability reward
    • explore the candidate paths as well as the recommended items
  • Multiple paths between u & i → select path with highest generative probability
  • Rank paths based on path rewards
    • recommend corresponding items

Xian, Yikun, et al. "Reinforcement Knowledge Graph Reasoning for Explainable Recommendation." SIGIR 2019

24 of 49

Outline

  • Introduction to Knowledge Graphs
  • Product knowledge graphs
  • Background research: Graph Reasoning PGPR (SIGIR 2019)
  • SeqReLG: Sequence aware RL over KG
  • Results
  • Ongoing Work

25 of 49

SeqReLG: Sequence-aware RL over Graphs

26 of 49

SeqReLG: Advancements

RQ1: Can we train better embeddings?

RQ2: How important is considering item sequences in policy/value network?

RQ3: Can we incorporate look-ahead items while evaluating actions?

27 of 49

SeqReLG: Advancements

RQ1: Train better embeddings

  • Obtaining positive and negative example is important to get good embeddings
  • Can we get better negatives?
    • Large no of entities → large vocab size�

28 of 49

SeqReLG: Advancements

RQ1: Train better embeddings

  • Obtaining positive and negative example is important to get good embeddings
  • Can we get better negatives?
    • Large no of entities → large vocab size�
  • Hierarchical softmax:
    • Instead of randomly sampling out-of-context negatives → tree like traversal
    • Eliminates the preference towards frequent categories

29 of 49

SeqReLG: Advancements

RQ2: Sequence-aware policy/value network

Reinforcement Learning setup:

  • State:
    • <u,i,t> user, item, time tuple
  • Action:
    • outgoing edges of entity et
    • user-conditional action pruning strategy
      • keeps the promising edges conditioned on the starting user based on a scoring function
  • Reward:
    • a “good” path → one that leads to an item a user will interact with with high probability

30 of 49

SeqReLG: Advancements

Policy / value network

31 of 49

SeqReLG: Advancements

Sequence of states

32 of 49

SeqReLG: Advancements

Sequence of states

Sequence important for Multi-Objective modeling

33 of 49

SeqReLG: Advancements

RQ3: Incorporating look-ahead items

34 of 49

SeqReLG: Advancements

RQ3: Incorporating look-ahead items

35 of 49

SeqReLG: Advancements

RQ3: Incorporating look-ahead items

36 of 49

Outline

  • Introduction to Knowledge Graphs
  • Product knowledge graphs
  • Background research: Graph Reasoning PGPR (SIGIR 2019)
  • SeqReLG: Sequence aware RL over KG
  • Results
  • Ongoing Work

37 of 49

Results - I

No change in RL model� → Only embedding training� phase changed

~2% improvement across all metrics

38 of 49

Results - I

No change in RL model� → Only embedding training� phase changed

~2% improvement across all metrics

Better trained embeddings are useful!

39 of 49

Results - II

Added sequence of items

~4% improvement in hit rate

Improvements across all metrics

40 of 49

Results - II

Added sequence of items

~4% improvement in hit rate

Improvements across all metrics

Sequence information of items in the path is helpful!

41 of 49

Results - III

RL model with look-ahead

~9% improvement in precision

Improvements across all metrics

Knowing where the path is headed helps!

42 of 49

Ongoing Work

43 of 49

Ongoing Work

44 of 49

Ongoing Work

Stakeholders

45 of 49

Ongoing Work

Stakeholders

Multi-objective RL:

  • Stakeholders & objectives?
    • Diversity
    • Exposure of items
    • Promotions

46 of 49

Ongoing Work

Stakeholders

Multi-objective RL:

  • Stakeholders & objectives?
  • via multi-objective rewards?

47 of 49

Ongoing Work

Stakeholders

Multi-objective RL:

  • Stakeholders & objectives?
  • via multi-objective rewards?
  • via multi-task policy network?

48 of 49

Ongoing Work

Stakeholders

Multi-objective RL:

  • Stakeholders & objectives?
  • via multi-objective rewards?
  • via multi-task policy network?
  • pareto-optimal methods?

49 of 49

Thank You!

Summary:

  1. Train better embeddings via hierarchical softmax�
  2. Sequence information is helpful in policy/value network�
  3. Look-ahead helps!

Ashish Gupta

Rishabh Mehrotra

On-going work: