1 of 16

Data Geometry and DL - Lecture 9

Hyperbolic Neural Networks

2 of 16

Embeddings: what and why

  • The data one studies, has some geometry: where to start?
  • The main approach: since we have bits, the answer must be a vector space. Euclidean space.

  • Word embeddings.

Word2Vec – learn the context-dependent probab.

We get a dictionary-sized vector for each word

The result works remarkably like euclidean space !!

  • How far does this go?
  • Problem: suboptimal for hierarchical structures (e.g. semantic classification).

3 of 16

Hierarchical relationships

  • Semantic relationships
  • Knowledge Graphs
  • The internet (paper)
  • Protein reaction networks in living beings (paper)
  • Phylogenetic networks

Typically, a tree structure is approximates well the graph.

Needed:

low distortion

metric trees embeddings

in NN’s latent space.

4 of 16

“Spaces that are almost like trees”

  • Metric tree:

  • or equivalently, for all x,y,z the geodesics [x,y], [x,z] meet at a point

  • Hyperbolic space (Poincare model): disk with distance

  • The boundary is at infinite distance, triangles have small interior

(here is an infinitely large triangle → )

5 of 16

Gromov hyperbolicity

  • δ-hyperbolic metric spaces: in a geodesic triangle, each side is in the δ-neighborhood of the union of the remaining two sides

(Other definitions are available, without need of geodesics)

  • Hyperbolic space is log3-hyperbolic

  • Finitely generated groups, with the word metric, are δ-hyperbolic with the “δ” depending on the largest relation/cycle

  • Low-distorsion embeddings of metric trees into δ-hyperbolic spaces always exist (paper for case of 2d Hyp. space)

  • What hyperbolic spaces have difficulty with accommodating isometrically, is large cycles (paper)

6 of 16

Difference between Euclidean and δ-Hyperbolic spaces

  • Bourgain 1985: “Any N-point metric space can be embedded into Hilbert space with distortion at most O(log N)”

  • Bourgain describes that examples exist, with best distorsion C log N / log log N

(probabilistic proof, uses Johnson-Lindenstrauss to project to dimension m=logN and compares covering numbers set of all graphs on N vertices, with the one in R^m)

  • Bourgain 1986: Distortion of depth-k binary tree is O((log k)^½)

  • Isometric immersion of a N-vertex star available only in dimension O(log(N)) (paper on general embeddings, paper on low distortion tree embeddings).

  • Curse of dimension (paper) – less important in Hyperbolic space: exponential volume growth

Therefore, Euclidean space embeddings of general graphs, and high-branching trees, is inefficient, and Hyperbolic space embeddings of high-branching trees is efficient.

These occur in social networks (paper) and in data with strong hierarchical content, such as semantic dependencies.

7 of 16

Hyperbolic Neural Networks

Main idea: use Hyperbolic space as a replacement for Euclidean space for embeddings

Consequence: we introduce hyperbolic analogues of vector space operations, and work on manifolds.

standard distance on Poincare disk.

Gyrovector spaces approach

(relativity) introduces several intuitive operations

NN operations lifted from tangent space

Non-commutativity: gyrations

8 of 16

Hyperbolic Neural Networks

9 of 16

Hyperbolic Neural Networks – more tools

  • Classif via points at infinity: Busemann loss function

  • Note that Hyperbolic ball volume |B_R| is exponential in R:

tradeoff distortion accuracy / description length (paper, could be extended)

  • Complications if no normalization applied: beta-normalization (paper)

10 of 16

Hyperbolic Neural Networks – more tools

  • Linear classifiers: hyperbolic version

Start in R^d

  • Hyperbolic averages (positive kernel convolution)

  • Hyperbolic adaptive optimization (paper1, incomplete! )

  • Hyperbolic attention: just concatenation + aggregation ?

11 of 16

Hyperbolic Graph Neural Networks

  • Hyperbolic Graph Neural Networks (paper1)

12 of 16

Hyperbolic Graph Neural Networks

  • Hyperbolic Graph Neural Networks (paper2)

13 of 16

Hyperbolic Neural Networks – another viewpoint

Interesting direction: natural Fisher distance over gaussian variables is hyperbolic (paper)

This is used in “Poincare GloVe” for geometry-aware Gaussian manifold Embeddings (paper)

14 of 16

Hyperbolic Neural Networks – Poincare latent space VAE

Example: learn random diffusion process outcome, via a Hyperbolic VAE (paper)

For the VAE, use Riemannian Normals or project normal from tangent space, loss according to Hyp. vol.

15 of 16

Hyperbolic Neural Network modules

  • Image classification: add hierarchical classification module, no need to interpret or to do Bag Of Features previously
  • Scalable recommender systems: add hierarchical embedding to be shared within user tools

16 of 16