Complex Embeddings for Simple Link Prediction
Theo Trouillon
Johannes Welbl
Sebastian Riedel
Eric Gaussier
Guillaume Bouchard
Jens Martensson
Introduction
How to leverage the KG for link prediction?
+ KBs express data as a directed graph with labeled edges (relations) between nodes (entities).
+ The missing items of a KB may be filled in by using natural redundancies among the reported relations.
+ The goal of link prediction is the automatic discovery of relationships between entities in the graph.
2
Jens Martensson
The proposed method
3
Jens Martensson
Modelling Relations
4
+ A relation between two entities is represented as a binary value Yso ∈ {−1, 1}, where s ∈ E is the subject of the relation and o ∈ E its object. Its probability is given by the logistic inverse link function:
+ Goal: find a generic structure for X that leads to a flexible approximation of common relations in real world KBs.
+ Generalised the notion of dot products to scoring functions, also known as composition functions, that combine embeddings in specific ways(Eigenvalue decomposition):
+ Eigenvalue decomposition is not possible in the real space; there only exists a decomposition in the complex space where embeddings are composed of a real vector component Re(x) and an imaginary vector component Im(x).
+ For a given entity, its subject embedding vector is the complex conjugate of its object embedding vector.
Jens Martensson
Low-Rank Decomposition
5
+ To enable the model to be learnable, i.e. to generalize to unobserved links, some regularity assumptions are needed.
+ Binary relations have a low sign-rank. The smallest rank of a real matrix with the same sign-pattern as Y is the sign-rank of a sign matrix.
+ By imposing a low-rank, a relation scores between entities s and t will be:
+ Three points when using low-rank decomposition:
(1) new factorization encompasses all possible binary relations.,
(2) By construction, it accurately describes both symmetric and antisymmetric relations
(3) Learnable relations can be efficiently approximated by a simple low-rank factorization, using complex numbers to represent the latent factors..
Jens Martensson
Application to Binary Multi-Relational Data
6
+ Depending on the scoring function φ(s, r, o; Θ) used to predict the entries of the tensor X, the model has been obtained the scoring function as follows:
+ Equation (10) show that by using the complex conjugate of one of the embeddings, model can handle asymmetry in the real vectors.
+ Equation (11) only involves real vectors corresponding to the real and imaginary parts of the embeddings and relations.
Jens Martensson
Code
7
Jens Martensson
Conclusions
8
+ A simple approach to matrix and tensor factorization for link prediction data that uses vectors with complex values and retains the mathematical definition of the dot product.
+ This work has several potential directions for expansion:
Jens Martensson
Thank You
Jens Martensson