1 of 32

Graph Mining in Recommender Systems

- A Tutorial at WISE 2021

Speakers:
Dr Hongxu Chen
Miss Yicong Li
Mr Haoran Yang

University of Technology of Sydney (UTS)

2 of 32

Agenda

Graph Representation Learning and its applications in Recommendation Systems

by Hongxu Chen, 14:15 – 14:45 October 27, 2021 (Wednesday)

Graph-based Explainable Recommender Systems

by Yicong Li, 14:45 – 15:15 October 27, 2021 (Wednesday)

Graph Contrastive Learning for Recommender Systems

by Haoran Yang, 15:15 – 15:45 October 27, 2021 (Wednesday)

3 of 32

1. Graph Representation Learning and its applications in Recommendation Systems

4 of 32

Networks are ubiquitous

5 of 32

Representing networks by vectors

Graph Representation Learning.

Also known as Graph Embedding or Network Embedding.
Low-dimensional vector for vertices.
Effectively preserve network structure.

Downstream data mining tasks on graphs:

Link prediction.
Node classification.
Recommendation.
…

6 of 32

Graph Embedding Methods

Early Methods (Matrix Factorization-based)

E.g., MDS, IsoMap, Spectral Clustering, Laplacian Eigenmap, etc.

Random-walk based Embedding Methods

E.g., DeepWalk, Node2Vec, Metapath2Vec, etc.

Graph Neural Networks

E.g., GCNs, SAGE, GAT, Graph Autoencoder, etc.

Handy Materials:

[1]. http://web.stanford.edu/class/cs224w/

7 of 32

Recommendation Systems as Graphs

https://images.app.goo.gl/YjzjovQnvZgwwdx56

8 of 32

As Bipartite Graph Mining

Matrix Factorization
Factorization Machines
Bayesian Analysis (PMF, BPR)
Deep Learning (NCF, DMF, DeepFM)

9 of 32

Timeline of development of Bipartite Modelling Approaches

Deng, Y. (2021). Recommender systems based on graph embedding techniques: A comprehensive review. arXiv preprint arXiv:2109.09587.

10 of 32

As General Graph Modelling

Trans Family (e.g., TransE, TransR, TransH, TransG, etc)
Random Walk and Meta-path
Deep Learning

11 of 32

Trans Family�

Model triplets (h,r,t) as basic elements in a graph.

For example:

TransE models a relation r as a translation from head entity h to tail entity t.
TransH maps each pair of (h, t) to multiple relation-specific hyper-planes distinguishing the diverse relations between nodes.
TransR maps nodes and relations separately into a node space and multiple different relation spaces corresponding to the diverse relations between head-tail pairs, respectively.

12 of 32

Random Walk and Meta-path

Meta-path
Meta-level description of a path between two objects (Sun, et al.).
Different Meta-Paths tell different Semantics.

13 of 32

Random Walk and Meta-path

Random Walk as paths.
Encode the paths as sentences (Inspired by advances in NLP).
Word2Vec: Skip-gram and Continuous Bag of Words (CBOW)
Negative Sampling

Graphs Embedding techniques recap:
LINE
Randwalk
Deepwalk
Node2Vec
Metapath2Vec
BINE (can also classified as Bipartite Network Embedding based method)

14 of 32

Deep Learning based approaches

Autoencoders

SDAE

Transformer

GCNs

SAGE

GAT

15 of 32

Timeline of General Graph Embedding based Approach

Deng, Y. (2021). Recommender systems based on graph embedding techniques: A comprehensive review. arXiv preprint arXiv:2109.09587.

16 of 32

Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

Latent Space Models

Most are built in Euclidean space.

Suffer from high distortion.
Need to increase the latent size in order to reduce the distortion.
Resources needed to train and store the model also increases.

Hyperbolic Space:

A promising new latent space to solve the above issues.

Expands faster than Euclidean space.
Preserve the hierarchies of data.

As part of Australian ARC Founded Discovery Project - Dynamics and Control of Complex Social Networks (DynaCo)

Because of the underlying graph structure of most real-world user-item interactions, graph-based recommendation has become a popular research field. Latent space models such as matrix factorization based models and metric learning based models are widely used in graph-based recommender systems. They are capable of learning user and item embeddings and utilizing the expressiveness of the latent space to capture the underlying distribution and the latent structure of the data. Most latent space models are built in Euclidean space, the most straightforward latent space consistent with human cognition. However, the ability of embedding methods to model complex patterns is inherently bounded by the dimensionality of Euclidean space. Euclidean embeddings suffer from a high distortion when embedding scale-free and hierarchical data. The distances of users and items are not likely to be preserved when embedded into Euclidean space. It is necessary to increase the latent size in order to reduce the distortion. However, as the latent size increases, the resources needed to train and store the model also increase.

Therefore, a more expressive latent space is needed to solve the problem. Recently, hyperbolic space has been explored as a new latent space and has shown impressive performance. It outperforms Euclidean space in many domains including natural language processing, link prediction and node classification, top-n recommendation, etc. A key property of hyperbolic space is that it expands faster than Euclidean space. Euclidean space expands polynomially, while hyperbolic space expands exponentially. In other words, the data capacity of hyperbolic space is larger than Euclidean space. Another important property of hyperbolic space is that it can preserve the hierarchies of data, which is common in many real-world data such as texts, e-commercial networks and social networks.

17 of 32

Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

Issues for Existing Literatures on Hyperbolic Space:

Practitioners are indiscriminately attempting to transfer variants of existing algorithms into hyperbolic geometry in regardless of having strong motivations.
Analysis and guidance about when to use hyperbolic space are missing.

Our Contributions:

We provide theoretical analysis and empirical results to validate our three hypotheses about the effectiveness of hyperbolic space.
We address the drawbacks of hyperbolic space, and give comments on when and where to use hyperbolic space.
We propose a metric learning based social recommendation method SCML and its hyperbolic version HSCML and validate our hypotheses on them. We also show that hyperbolic space has state-of-the-art performance by comparing HSCML with other baselines on two benchmark datasets.

As part of Australian ARC Founded Discovery Project - Dynamics and Control of Complex Social Networks (DynaCo)

Although hyperbolic embeddings are gaining great attention for recommender systems nowadays, it is not clear under what circumstances the hyperbolic space should be considered. A critical analysis and guidance are missing in existing literature. As a current trendy topic of hyperbolic embeddings, practitioners are indiscriminately attempting to transfer variants of existing algorithms into hyperbolic geometry in regardless of having strong motivations. Therefore, it is timely needed for a fair comparison and analysis on existing recommendation algorithm in hyperbolic space. It is unknown whether existing methods perform better in hyperbolic space compared against their original feature spaces. From dataset perspective, if hyperbolic geometry is regarded as a more suitable space for particular recommendation scenarios, we are wondering whether hyperbolic space is versatile across different datasets or is only suitable for particular datasets? Fair latitudinal comparisons are also missing in current existing works.

Our work aims to fix the gaps and provide a comprehensive analysis on hyperbolic space and hyperbolic embeddings for recommender systems. Our contributions are three-folds: first, we provide theoretical analysis and empirical results to validate our three hypotheses about the effectiveness of hyperbolic space; second, We address the drawbacks of hyperbolic space, and give comments on when and where to use hyperbolic space; third, We propose a metric learning based social recommendation method SCML and its hyperbolic version HSCML and validate our hypotheses on them. We also show that hyperbolic space has state-of-the-art performance by comparing HSCML with other baselines on two benchmark datasets.

18 of 32

Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

Hypotheses

Distance models are more suited for learning hyperbolic embeddings than projection models.

The distribution of embeddings is different after convergence.
Embeddings in projection models gather around the origin, making little use of the whole space.
Distance models can learn hierarchical information easier.

Hyperbolic space is more powerful compared to Euclidean space when the density of dataset is small.

Hyperbolic space can boost learning using the extracted hierarchical information.

The performance of hyperbolic space is better than Euclidean space when the latent size is small, but as the latent size increases, the performance of Euclidean space becomes comparable with hyperbolic space.

The latent size needed to embed certain amount of information is smaller for hyperbolic space.
With large enough latent size, Euclidean space will also have a very small distortion.

As part of Australian ARC Founded Discovery Project - Dynamics and Control of Complex Social Networks (DynaCo)

Now let me introduce three hypotheses we proposed regarding the performance of hyperbolic space. Our empirical experiments are designed to evaluate the correctness of these hypotheses.

First, we argue that distance models are more suited for learning hyperbolic embeddings than projection models. Since the projection models optimize the inner product and the distance models optimize the distance, the distribution of the embeddings is different after convergence. For the projection models, the relation between nodes is mainly determined by the angle. Angles of positive pairs are small and angles of negative pairs are obtuse. However, for the distance models, the relation is only determined by the distance. The positive pairs are pushed close to each other, and the negative pairs are pushed away from each other. If we apply hyperbolic space to projection models, one issue is that, projection models usually regularize the norm of the embeddings in a limited scale, which makes it impossible to push the embeddings far from the center. So in hyperbolic space, the norm of the embedding will also be in a limited scale, making it hard to make use of the outer part of the hyperbolic space where the capacity is the largest efficiently, thus reducing the expressiveness. However, distance models do not have such limitation and can make use of the entire space. Embeddings can be pushed arbitrarily far from the origin as long as the precision allows. Another advantage of distance models is the ability to learn hierarchical information. Nodes on a circle have the same relation with the center node, so a group of leaf nodes will be likely to spread around a root node layer by layer in the form of concentric circles. As the number of leaf nodes and the capacity of the hyperbolic space are all increasing exponentially, it's easier for distance models to learn low-distortion hyperbolic embeddings compared to projection models. Therefore, distance models should be a better choice for learning hyperbolic embeddings rather than projection models.

Second, we think hyperbolic space is more powerful compared to Euclidean space when the density of dataset is small. Hyperbolic space is naturally capable of extracting the hierarchical information from datasets. It can use such hierarchy to help learn the node embeddings more effectively. Therefore, it can have a relatively good performance even the density is low. However, it is difficult for Euclidean space to extract the hierarchical information. Instead, Euclidean space needs more user-item pairs to help reach a comparable performance with hyperbolic space. That is to say, if the density of the dataset is small, then hyperbolic space is likely to outperform Euclidean space, but if the density is large, Euclidean space can have a comparable performance with hyperbolic space.

Third, we assume that the performance of hyperbolic space is better than Euclidean space when the latent size is small, but as the latent size increases, the performance of Euclidean space becomes comparable with hyperbolic space. The expressiveness of hyperbolic space is much larger than Euclidean space. For example, when embedding one type of hierarchy, a 2D Poincaré disk can have arbitrary small distortion, but a 2D Euclidean space will always have some distortion. So to fully capture all the hierarchical information in the dataset, the number of dimensions needed in hyperbolic space is much smaller than Euclidean space. This is our reason why hyperbolic space should outperform Euclidean space when the latent size is small. On the other side, as the latent size increases, the performance of Euclidean space and hyperbolic space both have an upper bound. Besides, they should share the same upper bound, because both of them are capable of embedding the nodes with an arbitrary low distortion with a large enough latent size. So as the latent size increases, the gap between their performances should decrease. Eventually Euclidean space will have a similar performance with hyperbolic space.

19 of 32

Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

Experiments (General Item Recommendation)

High Density Datasets vs. Low Density Datasets

As part of Australian ARC Founded Discovery Project - Dynamics and Control of Complex Social Networks (DynaCo)

Now let’s look at the experiment results and evaluate our hypotheses on general item recommendation scenario. We will show the results of social recommendation scenario afterwards. First we show the influence of the density of datasets. Let’s take CML for an example. The pictures in the first row are HRs and NDCGs of movielens, and the pictures in the second row are for automotive and cellphones. Recall that the density of movielens is much larger than automotive and cellphones. We may just focus on the pink and red lines in these pictures, which represent the HRs and NDCGs of the hyperbolic space and Euclidean space respectively when the latent size is 10. For movielens, the pink ones and the red ones are very close to each other, while for automotive and cellphones, the pink ones greatly outperform the red ones. In other words, hyperbolic space and Euclidean space have a similar performance on movielens, however, hyperbolic space clearly outperforms Euclidean space on automotive and cellphones. This supports our hypothesis that hyperbolic space is more powerful compared to Euclidean space when the density of dataset is small.

20 of 32

Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

Experiments (General Item Recommendation)

Influence of the Latent Size

As part of Australian ARC Founded Discovery Project - Dynamics and Control of Complex Social Networks (DynaCo)

21 of 32

Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

Experiments (General Item Recommendation)

Distance Model vs. Projection Model

As part of Australian ARC Founded Discovery Project - Dynamics and Control of Complex Social Networks (DynaCo)

22 of 32

Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

Experiments (General Item Recommendation)

Distance Model vs. Projection Model

As part of Australian ARC Founded Discovery Project - Dynamics and Control of Complex Social Networks (DynaCo)

To answer this question, let’s take a projection model as an example, for example, MF-BPR. To quickly capture the idea, let’s look at picture e here, which is the result of HR on automotive. The deep green line represents the result of Euclidean space when the latent size is 100, and the light green line represents the result of hyperbolic space when the latent size is also 100. We can observe that, the deep green line is above the light green line. In other words, Euclidean space achieves a better hit ratio than hyperbolic space. This suggests that, Euclidean space should be a better latent space than hyperbolic space for projection models. Therefore, our hypothesis regarding distance models and projection models is proved to be true, that distance models are more suited for learning hyperbolic embeddings than projection models.

23 of 32

Where are we in embedding spaces? A Comprehensive Analysis on Network Embedding Approaches for Recommender Systems

Drawbacks of Hyperbolic Space

High computational complexity.
Invalid values.

Comments on Using Hyperbolic Space

Distance models are more suited for learning hyperbolic embeddings than projection models.
If the density of the dataset is large, Euclidean space should be a better choice because it has a comparable performance with hyperbolic space and the computational complexity is low; when the density is low, hyperbolic space is more preferable because it will have a much better performance than Euclidean space.
Choose an appropriate latent size. In most cases, hyperbolic embeddings only need a relatively small latent size to achieve good performance, which can help to save resources.

As part of Australian ARC Founded Discovery Project - Dynamics and Control of Complex Social Networks (DynaCo)

24 of 32

Others and Challenges

Three ground challenges:

Heterogeneity
Scalability
Dynamics

25 of 32

Timeline of key developments regrading the challenges

Deng, Y. (2021). Recommender systems based on graph embedding techniques: A comprehensive review. arXiv preprint arXiv:2109.09587.

26 of 32

Heterogeneity

The Real World: Heterogenous Networks

Multiple object types.
Multiple link types.

E.g., vertex u1 is close to both b2 and u3, but these relationships have different semantics. b2 is a business visited by user u1, while u3 is a friend of u1.

Homogenous Networks

Single-Typed Nodes.
Single-Typed Links.

27 of 32

Projected Metric Embedding (KDD18)

Dot product is used to compute the proximity between different types of nodes
Not a metric based distance
Violates the crucial triangle inequality

Therefore, existing HIN embedding methods (e.g., Metapath2vec and EOE)

Can only capture local structures (both A and B are close to C)
But fail to capture the second-order proximities. (A and B are also close)

Node A is close to Node C
Node B is close to Node C

Node A is also close to Node B

Previous attempts.

To model semantic-specific relationships:

Metapath2vec (Dong et. al., KDD 2017)
EOE (Xu et. al., WSDM 2017)

28 of 32

Projected Metric Embedding (KDD18)

Our proposed method

Projected Metric Embedding for HIN

Simultaneously preserves:

The first-order proximities between nodes
And the second-order proximities between nodes

Such that, in the latent space,

29 of 32

Projected Metric Embedding (KDD18)

However, directly applying the Euclidean distance as a metric will be problematic !

Mathematically, It is geometrically restrictive and an ill-posed algebraic system.
On the other hand, one object may have multiple aspects.

To address these issues:

PME introduces relation-specific projection embedding matrices.
Model objects and relations in distinct spaces.

One shared object space.
Multiple relation spaces. (relation-specific object spaces).

Hence, it is possible that some objects are far away from each other in the object space, but are close to each other in the corresponding relation spaces.

30 of 32

Projected Metric Embedding (KDD18)

Decompose the HIN to link-specific sub-networks.
Metric learning on each link-specific space.

Learn shared node embeddings and relation-specific space.

31 of 32

Projected Metric Embedding (KDD18)

Model optimization:

Bi-directional Negative Sampling Strategy

Directly optimizing this equation is expensive!

Inspired by the negative sampling techniques

first fix vertex v_i and edge type r, then generate K negative vertices v_k

then fix right side of e_ijr , and sample K negative vertex from the left side

32 of 32

Projected Metric Embedding (KDD18)

Model optimization:

Loss-aware Adaptive Positive Sampling Strategy

A sequence of the losses for each sub-network.

Simply calculate the sum of the losses.

Draw a random value within the range of [0,1].

To see which interval the random fails into.