1 of 14

A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs

Workshop on Graph Learning Benchmarks (GLB 2022)�Presented by Charles Tapley Hoyt on April 26^th, 2022�Download at https://bit.ly/glb2022-ranking-metrics or https://zenodo.org/record/6489211, licensed under CC BY-4.0

Max Berrendorf* LMU Munich

Mikhail Galkin�Mila, McGill University

Volker Tresp

LMU Munich, Siemens

Benjamin M. Gyori�Harvard Medical School

Charles Tapley Hoyt*�Harvard Medical School

2 of 14

Common Rank-based Metrics Have Some Issues

Mean Rank (MR)�Better reflects average, but susceptible to outliers

Co-domain = [1, inf)

Mean Reciprocal Rank (MRR)�Biased towards low ranks, but doesn't completely disregard high ones à la hits at k

Co-domain = (0, 1]

Hits at K

Does not differentiate between misses at k + 1 and k + d for a large d��Co-domain = [0,1]�

3 of 14

General Form of a Rank-based Metric

4 of 14

Desiderata for Improved Rank-based Metrics

5 of 14

Motivation for Improved Rank-based Metrics

Compare results across different datasets
Hyperparameter optimization on construction of dataset
Biomedical knowledge graph being used for drug-disease link prediction could comprise several sources:

Protein-protein interactions (e.g., from BioGRID)
Chemical-protein interactions (e.g., from ChEMBL)
Protein-disease associations (e.g., from DisGeNet)
Protein-pathway membership (e.g., from Reactome)
Chemical-disease clinical results (e.g., from DrugBank)

6 of 14

Metrics' Expectation Depends on Dataset Size

7 of 14

New Metrics through Statistical Adjustment

Solution: introduce affine statistical adjustments (like Bonferroni in statistics). Make adjustment by:

�

Adjust by expectation
Adjust by expectation and optimum
Adjust by expectation and variance (z-score)

8 of 14

Proposed New Metrics

Adjusted Mean Rank�(recapitulated from Berrendorf et al., 2020)
Adjusted Mean Rank Index�(recapitulated from Berrendorf et al., 2020)
z-Mean Rank
Adjusted Mean Reciprocal Rank
Adjusted Mean Reciprocal Rank Index
z-Mean Reciprocal Rank
Adjusted Hits at K
Adjusted Hits at K Index
z-Hits at K
Geometric Mean Rank
Adjusted Geometric Mean Rank
Adjusted Geometric Mean Rank Index
z-Geometric Mean Rank

Each new metric comes with a reference implementation already available in PyKEEN v1.8.0. Use the following code to get started (it accepts lots of synonyms, too):

9 of 14

Case Study: Rank-based Evaluation Metrics

☢️ Brief Derivations for MRR ☢️

10 of 14

Post-facto Adjustments

We pre-computed the expectations and variances for 34 datasets in PyKEEN for each:

Split (training, testing, evaluation)
Evaluation Task (left-hand, right-hand, both)
Metric (MRR, MR, HK, GMR, etc.)

Download from https://zenodo.org/record/6369163 as a gzipped TSV and apply as an "affine" transformation to existing results.

11 of 14

Paper	📜	https://arxiv.org/abs/2203.07544
Reference Implementation	🧑‍🔬	https://github.com/pykeen/pykeen
Analysis and Results	📊	https://github.com/pykeen/ranking-metrics-manuscript
Website	🌐	https://pykeen.github.io/ranking-metrics-manuscript
Post-facto Adjustments	🎛️	https://zenodo.org/record/6369163

12 of 14

PyKEEN Advisors

PyKEEN Contributors

13 of 14

Generalized Hölder Means

Image from https://en.wikipedia.org/wiki/Pythagorean_means