1 of 14

A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs

Workshop on Graph Learning Benchmarks (GLB 2022)�Presented by Charles Tapley Hoyt on April 26th, 2022�Download at https://bit.ly/glb2022-ranking-metrics or https://zenodo.org/record/6489211, licensed under CC BY-4.0

1

Max Berrendorf* LMU Munich

Mikhail GalkinMila, McGill University

Volker Tresp

LMU Munich, Siemens

Benjamin M. Gyori�Harvard Medical School

Charles Tapley Hoyt*�Harvard Medical School

2 of 14

Common Rank-based Metrics Have Some Issues

2

Mean Rank (MR)�Better reflects average, but susceptible to outliers

Co-domain = [1, inf)

Mean Reciprocal Rank (MRR)�Biased towards low ranks, but doesn't completely disregard high ones à la hits at k

Co-domain = (0, 1]

Hits at K

Does not differentiate between misses at k + 1 and k + d for a large d��Co-domain = [0,1]

3 of 14

General Form of a Rank-based Metric

3

4 of 14

Desiderata for Improved Rank-based Metrics

4

5 of 14

Motivation for Improved Rank-based Metrics

  • Compare results across different datasets
  • Hyperparameter optimization on construction of dataset
  • Biomedical knowledge graph being used for drug-disease link prediction could comprise several sources:
    • Protein-protein interactions (e.g., from BioGRID)
    • Chemical-protein interactions (e.g., from ChEMBL)
    • Protein-disease associations (e.g., from DisGeNet)
    • Protein-pathway membership (e.g., from Reactome)
    • Chemical-disease clinical results (e.g., from DrugBank)

5

6 of 14

Metrics' Expectation Depends on Dataset Size

6

7 of 14

New Metrics through Statistical Adjustment

Solution: introduce affine statistical adjustments (like Bonferroni in statistics). Make adjustment by:

7

Adjust by expectation

Adjust by expectation and optimum

Adjust by expectation and variance (z-score)

8 of 14

Proposed New Metrics

  1. Adjusted Mean Rank�(recapitulated from Berrendorf et al., 2020)
  2. Adjusted Mean Rank Index�(recapitulated from Berrendorf et al., 2020)
  3. z-Mean Rank
  4. Adjusted Mean Reciprocal Rank
  5. Adjusted Mean Reciprocal Rank Index
  6. z-Mean Reciprocal Rank
  7. Adjusted Hits at K
  8. Adjusted Hits at K Index
  9. z-Hits at K
  10. Geometric Mean Rank
  11. Adjusted Geometric Mean Rank
  12. Adjusted Geometric Mean Rank Index
  13. z-Geometric Mean Rank

Each new metric comes with a reference implementation already available in PyKEEN v1.8.0. Use the following code to get started (it accepts lots of synonyms, too):

8

9 of 14

Case Study: Rank-based Evaluation Metrics

9

☢️ Brief Derivations for MRR ☢️

10 of 14

Post-facto Adjustments

We pre-computed the expectations and variances for 34 datasets in PyKEEN for each:

  • Split (training, testing, evaluation)
  • Evaluation Task (left-hand, right-hand, both)
  • Metric (MRR, MR, HK, GMR, etc.)

Download from https://zenodo.org/record/6369163 as a gzipped TSV and apply as an "affine" transformation to existing results.

10

11 of 14

11

Paper

📜

Reference Implementation

🧑‍🔬

Analysis and Results

📊

Website

🌐

Post-facto Adjustments

🎛️

12 of 14

12

PyKEEN Advisors

PyKEEN Contributors

13 of 14

Generalized Hölder Means

13

14 of 14

General Form of a Rank-based Metric

14