1 of 89

Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

The Web Conference 2022 - Tutorials Session

Thanasis Vergoulis

Ilias Kanellos

Dimitris Sacharidis

2 of 89

Part B: Approaches for �Estimating the Impact of Papers

Ilias Kanellos (ATHENA RC, Greece)

3 of 89

Background

Wide availability of SKGs

  • Large number of scientific papers - publish or perish
  • Large number of paper impact assessment methods in literature
    • Many share similar concepts and ideas

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

4 of 89

Background

Wide availability of SKGs

  • Large number of scientific papers - publish or perish
  • Large number of paper impact assessment methods in literature
    • Many share similar concepts and ideas

Different methods evaluated based on

  • Different goals
  • Different datasets

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

5 of 89

Background

Wide availability of SKGs

  • Large number of scientific papers - publish or perish
  • Large number of paper impact assessment methods in literature
    • Many share similar concepts and ideas

Different methods evaluated based on

  • Different goals
  • Different datasets

Unclear which method to choose and under which circumstances

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

6 of 89

How to Assess Impact?

Plethora of Methods in Literature

  • Αt least 32 distinct methods as of 2019

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

7 of 89

How to Assess Impact?

Plethora of Methods in Literature

  • Αt least 32 distinct methods as of 2019

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

8 of 89

How to Assess Impact?

Plethora of Methods in Literature

  • Αt least 32 distinct methods as of 2019

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

9 of 89

How to Assess Impact?

Plethora of Methods in Literature

  • Αt least 32 distinct methods as of 2019

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

10 of 89

How to Assess Impact?

Plethora of Methods in Literature

  • Αt least 32 distinct methods as of 2019

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

11 of 89

How to Assess Impact?

Plethora of Methods in Literature

  • Αt least 32 distinct methods as of 2019

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

12 of 89

How to Assess Impact?

Plethora of Methods in Literature

  • Αt least 32 distinct methods as of 2019

Problem dependent

  • No clear definition of impact1
    • Defined in many different ways

  1. Bollen J, Van de Sompel H, Hagberg A, Chute R. A principal component analysis of 39 scientific impact measures. PloS one. 2009 Jun 29;4(6):e6022.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

13 of 89

How to Assess Impact?

Plethora of Methods in Literature

  • Αt least 32 distinct methods as of 2019

Problem dependent

  • No clear definition of impact1
    • Defined in many different ways
  • At least two impact aspects
    • Influence - long term impact
    • Popularity - short term impact
  • Bollen J, Van de Sompel H, Hagberg A, Chute R. A principal component analysis of 39 scientific impact measures. PloS one. 2009 Jun 29;4(6):e6022.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

14 of 89

Ranking in Citation Networks

Impact Assessment expressed as Ranking Problem

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

15 of 89

Ranking in Citation Networks

Impact Assessment expressed as Ranking Problem

  • Impact assessed comparatively based on score (e.g., Citation Count)

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

16 of 89

Ranking in Citation Networks

Impact Assessment expressed as Ranking Problem

  • Impact assessed comparatively based on score (e.g., Citation Count)
  • Other network centrality measures can be impact proxies
  • Much literature analyzes citation networks in different ways to assess paper impact

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

17 of 89

Citation NetworksBasic Concepts

Citation Network is a Graph with

  • Papers as Nodes
  • References as edges

References point backwards in time

No cycles expected

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

18 of 89

Citation NetworksBasic Concepts

Citation Network is a Graph with

  • Papers as Nodes
  • References as edges

Paper denoted by

Citations Represented by

Citation Matrix A

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

19 of 89

Citation NetworksBasic Concepts

Citation Network is a Graph with

  • Papers as Nodes
  • References as edges

Paper denoted by

Citations Represented by

Citation Matrix A

Stochastic Matrix S*

*Sub-stochastic based on formula. Add 1/N for dangling nodes

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

20 of 89

Common Centralities ICitation Counts

Network centrality measures ~ impact proxies

De facto traditional measure of Scientific Impact

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

21 of 89

Common Centralities ICitation Counts

Network centrality measures ~ impact proxies

De facto traditional measure of Scientific Impact

In terms of Citation Matrix A, citation count for paper given as sum over all j for row i

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

22 of 89

Common Centralities IIPageRank

“A high impact paper is cited by other high impact papers”

  • Distinguish citing papers by their impact

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

23 of 89

Common Centralities IIPageRank

“A high impact paper is cited by other high impact papers”

  • Distinguish citing papers by their impact

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

24 of 89

Common Centralities IIPageRank

“A high impact paper is cited by other high impact papers”

  • Distinguish citing papers by their impact

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

25 of 89

Common Centralities IIPageRank

“A high impact paper is cited by other high impact papers”

  • Distinguish citing papers by their impact
  • “Random surfer” (researcher) model

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

26 of 89

Common Centralities IIPageRank

“A high impact paper is cited by other high impact papers”

  • Distinguish citing papers by their impact
  • “Random surfer” (researcher) model

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

27 of 89

Common Centralities IIPageRank

“A high impact paper is cited by other high impact papers”

  • Distinguish citing papers by their impact
  • “Random surfer” (researcher) model

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

28 of 89

Common Centralities IIPageRank

“A high impact paper is cited by �other high impact papers”

  • Distinguish citing papers by their impact
  • “Random surfer” (researcher) model

  • Early applications on citation networks by Chen et al1 & Ma et al2
  • PR calculated iteratively until convergence�

  1. Chen P, Xie H, Maslov S, Redner S. Finding scientific gems with Google’s PageRank algorithm. Journal of Informetrics. 2007 Jan 1;1(1):8-15.
  2. Ma N, Guan J, Zhao Y. Bringing PageRank to the citation analysis. Information Processing & Management. 2008 Mar 1;44(2):800-10.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

29 of 89

Impact AssessmentGoals and Approaches

Problems

  • Citation Count “too democratic” - no differentiation of origin���

Approaches

  • Balance citations
  • Network analyses (e.g., PageRank)
  • Weights (e.g., on venues, authors, etc)�

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

30 of 89

Impact AssessmentGoals and Approaches

Problems

  • Citation Count “too democratic” - no differentiation of origin����
  • Older papers have citation headstart / top-ranked papers skewed in favor of old ones��

Approaches

  • Balance citations
  • Network analyses (e.g., PageRank)
  • Weights (e.g., on venues, authors, etc)�
  • Time-awareness
  • Exponential decay functions
  • Re-scaling / normalizations���

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

31 of 89

Impact AssessmentGoals and Approaches

Problems

  • Citation Count “too democratic” - no differentiation of origin����
  • Older papers have citation headstart / top-ranked papers skewed in favor of old ones��
  • Avoid “malicious manipulations” and/or “noise”

Approaches

  • Balance citations
  • Network analyses (e.g., PageRank)
  • Weights (e.g., on venues, authors, etc)�
  • Time-awareness
  • Exponential decay functions
  • Re-scaling / normalizations��
  • Neglect self citations
  • Consider citing-cited paper similarities

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

32 of 89

Impact AssessmentGoals and Approaches

Problems

  • Citation Count “too democratic” - no differentiation of origin����
  • Older papers have citation headstart / top-ranked papers skewed in favor of old ones��
  • Avoid “malicious manipulations” and/or “noise”

Approaches

  • Balance citations
  • Network analyses (e.g., PageRank)
  • Weights (e.g., on venues, authors, etc)�
  • Time-awareness
  • Exponential decay functions
  • Re-scaling / normalizations��
  • Neglect self citations
  • Consider citing-cited paper similarities

?�And others…

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

33 of 89

Classification of Methods in Literature1

  1. Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Impact-based ranking of scientific publications: a survey and experimental evaluation. IEEE Transactions on Knowledge and Data Engineering. 2019 Sep 13;33(4):1567-84.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

34 of 89

Classification IData leveraged

Citations only

  • Citation Count, PageRank

Paper Metadata

  • Publication Venues and/or Author Information
    • Others options (e..g, institution-based info)

Publication time-based metadata (weights)

  • Paper age
    • When was a paper published
  • Citation age
    • When was a paper cited
  • Citation gap
    • How much time passed when a paper was cited since its publication

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

35 of 89

Classification IIComputational Model

Citation Count

PageRank

Heterogeneous Networks

Ensemble Methods

Other Approaches

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

36 of 89

Classification IIComputational Model

Citation Count

  • Use only (direct) citations
  • Or apply weights on citations (e.g., based on publication venues, based on authors, etc)�

PageRank

Heterogeneous Networks

Ensemble Methods

Other Approaches

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

37 of 89

Classification IIComputational Model

Citation Count

  • Use only (direct) citations
  • Or apply weights on citations (e.g., based on publication venues, based on authors, etc)
  • E.g., citations from A have weight w

PageRank

Heterogeneous Networks

Ensemble Methods

Other Approaches

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

38 of 89

Citation Count-based Approaches�Example Methods

Weighted Citation1

  • Weigh citations based on journal prestige
    • Weights by Article Influence Score, function of Eigenfactor
    • EF: Eigenfactor of A’s Journal is PR-like score on journal networks
    • a: fraction of articles in J over a time year window

  1. Yan E, Ding Y. Weighted citation: An indicator of an article's prestige. Journal of the American Society for Information Science and Technology. 2010 Aug;61(8):1635-43.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

39 of 89

Citation Count-based Approaches�Example Methods

Weighted Citation1

  • Weigh citations based on journal prestige
  • Weigh citations based on “quickness” (citation gap)
    • “Quick citations” considered to convey
      • Important breakthroughs
      • Authority authors
    • f(x)∼e−0.117x
      • Based on empirical citation data

  • Yan E, Ding Y. Weighted citation: An indicator of an article's prestige. Journal of the American Society for Information Science and Technology. 2010 Aug;61(8):1635-43.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

40 of 89

Citation Count-based Approaches�Example Methods

Weighted Citation1

  • Weigh citations based on journal prestige
  • Weigh citations based on “quickness” (citation gap)

Example

  • Due to nature of citation network �
  • Longer citation gaps decrease weight

  • Yan E, Ding Y. Weighted citation: An indicator of an article's prestige. Journal of the American Society for Information Science and Technology. 2010 Aug;61(8):1635-43.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

41 of 89

Citation Count-based Approaches�Example Methods

RAM1

  • Recent citations more important
  • Adj. Matrix => Retained Adjacency Matrix (RAM)
  • N = current year
  1. Ghosh R, Kuo TT, Hsu CN, Lin SD, Lerman K. Time-aware ranking in dynamic citation networks. In2011 ieee 11th international conference on data mining workshops 2011 Dec 11 (pp. 373-380). IEEE.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

42 of 89

Citation Count-based Approaches�Example Methods

RAM1

  • Recent citations more important
  • Adj. Matrix => Retained Adjacency Matrix (RAM)
  • N = current year
  • Ghosh R, Kuo TT, Hsu CN, Lin SD, Lerman K. Time-aware ranking in dynamic citation networks. In2011 ieee 11th international conference on data mining workshops 2011 Dec 11 (pp. 373-380). IEEE.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

43 of 89

Citation Count-based Approaches�Example Methods (Maybe lose this slide)

ECM1

  • Expand RAM to calculate chains of citations
  • Attenuate with length�
  • Ghosh R, Kuo TT, Hsu CN, Lin SD, Lerman K. Time-aware ranking in dynamic citation networks. In2011 ieee 11th international conference on data mining workshops 2011 Dec 11 (pp. 373-380). IEEE.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

44 of 89

Citation Count-based Approaches�Example Methods (Maybe lose this slide)

ECM1

  • Expand RAM to calculate chains of citations
  • Attenuate with length�

Example

  • One-hop paths

  • Ghosh R, Kuo TT, Hsu CN, Lin SD, Lerman K. Time-aware ranking in dynamic citation networks. In2011 ieee 11th international conference on data mining workshops 2011 Dec 11 (pp. 373-380). IEEE.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

45 of 89

Citation Count-based Approaches�Example Methods (Maybe lose this slide)

ECM1

  • Expand RAM to calculate chains of citations
  • Attenuate with length�

Example

  • One-hop paths
  • Two-hop paths

  • Ghosh R, Kuo TT, Hsu CN, Lin SD, Lerman K. Time-aware ranking in dynamic citation networks. In2011 ieee 11th international conference on data mining workshops 2011 Dec 11 (pp. 373-380). IEEE.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

46 of 89

Classification IIComputational Model

Citation Count

PageRank

  • Modify random surfer model

Heterogeneous Networks

Ensemble Methods

Other Approaches

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

47 of 89

PageRankSemantics

PageRank simulates “random researcher”

  • When reading a particular paper choose
    • With probability a another paper in its reference list
    • With probability 1-a any paper in the citation network
  • Next paper depends only on paper

This behaviour can be modeled by a Finite State Discrete Markov Chain

  • Transition matrix� J: matrix of all 1s
  • PageRank scores are values of stationary distribution of G
  • Calculate using power iteration

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

48 of 89

PageRankConvergence

PageRank vector results from application of power method on G matrix

Convergence guaranteed by Perron-Frobenius Theorem1 when

  • Matrix is stochastic (valid by definition for G)
  • Matrix is irreducible
    • Guaranteed when all states can transition to all other states (all papers “cite” all other papers)
    • Guaranteed for G, because all cells > 0, least value (1-α)/N
  • Matrix is aperiodic
    • Guaranteed by self-loops (i.e., non zero diagonal entries of matrix G)
    • Guaranteed by PageRank’s random jump vector
  1. Langville AN, Meyer CD. Google's PageRank and beyond. Princeton university press; 2011 Jul 1.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

49 of 89

PageRankConvergence Consequences

Define any matrix S’ which is

  • Stochastic
  • Instead of 1/k, use different weights, as long as matrix stays column-stochastic

Add custom-jump vector (vanilla PageRank is uniform)

  • Ensure non-zero values in all cells
    • Choose vector w/ positive values on all dimensions
    • Normalize it

Above interventions easily translate to particular “* researcher” behaviour

Any quantity can be normalized and applied in to Stochastic Matrix and/or Random jump vector

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

50 of 89

PageRankAdjustments to G matrix

Focused PageRank1

  • Balance PR and CC
  • Researcher prefers most cited among papers in reference list
  • Replace 1/k in S with �

Example

  1. Krapivin M, Marchese M. Focused page rank in scientific papers ranking. InInternational Conference on Asian Digital Libraries 2008 Dec 2 (pp. 144-153). Springer, Berlin, Heidelberg.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

51 of 89

PageRank�Time Aware Approach

CiteRank1

  • Assumption: researchers start browsing from recent works
    • Then follow citations
  • Modify random jump vector ��
  • CiteRank defined as��
  • If normalized, rewrite2 as

  1. Walker D, Xie H, Yan KK, Maslov S. Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment. 2007 Jun 14;2007(06):P06010.
  2. Mariani MS, Medo M, Zhang YC. Identification of milestone papers through time-balanced network centrality. Journal of Informetrics. 2016 Nov 1;10(4):1207-23.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

52 of 89

PageRank�Time Aware Approach

CiteRank1

  • Assumption: researchers start browsing from recent works
    • Then follow citations
  • Modify random jump vector ��
  • CiteRank defined as��
  • If normalized, rewrite2 as

  • Walker D, Xie H, Yan KK, Maslov S. Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment. 2007 Jun 14;2007(06):P06010.
  • Mariani MS, Medo M, Zhang YC. Identification of milestone papers through time-balanced network centrality. Journal of Informetrics. 2016 Nov 1;10(4):1207-23.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

53 of 89

Engineering PageRankOur time-aware approach

AttRank1

  • Aim: current research trends
  • Apply preferential attachment
    • Rich get richer
  • Intuition: use only y-most recent years

  • α+β+γ=1, β & γ normalized
    • Guarantees convergence
  • Researcher starts reading recently published, or recently cited papers.
  1. Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Ranking papers by their short-term scientific impact. In2021 IEEE 37th International Conference on Data Engineering (ICDE) 2021 Apr 19 (pp. 1997-2002). IEEE.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

54 of 89

Engineering PageRankOur time-aware approach

AttRank1

  • Aim: current research trends
  • Apply preferential attachment
    • Rich get richer
  • Intuition: use only y-most recent years

  • α+β+γ=1, β & γ normalized
    • Guarantees convergence
  • Researcher starts reading recently published, or recently cited papers.
  • Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Ranking papers by their short-term scientific impact. In2021 IEEE 37th International Conference on Data Engineering (ICDE) 2021 Apr 19 (pp. 1997-2002). IEEE.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

55 of 89

Engineering PageRankOur time-aware approach

AttRank1

  • Aim: current research trends
  • Apply preferential attachment
    • Rich get richer
  • Intuition: use only y-most recent years

  • α+β+γ=1, β & γ normalized
    • Guarantees convergence
  • Researcher starts reading recently published, or recently cited papers.
  • Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Ranking papers by their short-term scientific impact. In2021 IEEE 37th International Conference on Data Engineering (ICDE) 2021 Apr 19 (pp. 1997-2002). IEEE.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

56 of 89

Engineering PageRankOur time-aware approach

AttRank1

  • Aim: current research trends
  • Apply preferential attachment
    • Rich get richer
  • Intuition: use only y-most recent years

  • α+β+γ=1, β & γ normalized
    • Guarantees convergence
  • Researcher starts reading recently published, or recently cited papers.
  • Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Ranking papers by their short-term scientific impact. In2021 IEEE 37th International Conference on Data Engineering (ICDE) 2021 Apr 19 (pp. 1997-2002). IEEE.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

57 of 89

Classification IIComputational Model

Citation Count

PageRank

Heterogeneous Networks

  • Nodes represent different types of entities
  • Edges represent relations (e.g., paper published in venue)
  • Some methods inspired by HITS apply mutual reinforcement
  • Can provide rankings of different entities (e.g., authors and papers)

Ensemble Methods

Other Approaches

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

58 of 89

Heterogeneous NetworksApplications

P-Rank1

  • Differentiate citations based on citing papers, journals, authors
  • Defines inter- and intra-graph walks on heterogeneous network
  • Author scores based on their papers
  • Venue scores based on their papers
  • “Random” Jump Vector based on above, run PageRank iteration

  1. Yan E, Ding Y, Sugimoto CR. P‐Rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the american society for information science and technology. 2011 Mar;62(3):467-77.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

59 of 89

Heterogeneous NetworksApplications

P-Rank1

  • Differentiate citations based on citing papers, journals, authors
  • Defines inter- and intra-graph walks on heterogeneous network
  • Author scores based on their papers
  • Venue scores based on their papers
  • “Random” Jump Vector based on above, run PageRank iteration

  • Yan E, Ding Y, Sugimoto CR. P‐Rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the american society for information science and technology. 2011 Mar;62(3):467-77.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

60 of 89

Heterogeneous NetworksApplications

P-Rank1

  • Differentiate citations based on citing papers, journals, authors
  • Defines inter- and intra-graph walks on heterogeneous network
  • Author scores based on their papers
  • Venue scores based on their papers
  • “Random” Jump Vector based on above, run PageRank iteration

  • Yan E, Ding Y, Sugimoto CR. P‐Rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the american society for information science and technology. 2011 Mar;62(3):467-77.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

61 of 89

Heterogeneous NetworksApplications

P-Rank1

  • Differentiate citations based on citing papers, journals, authors
  • Defines inter- and intra-graph walks on heterogeneous network
  • Author scores based on their papers
  • Venue scores based on their papers
  • “Random” Jump Vector based on above, run PageRank iteration

  • Yan E, Ding Y, Sugimoto CR. P‐Rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the american society for information science and technology. 2011 Mar;62(3):467-77.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

62 of 89

Heterogeneous NetworksApplications

P-Rank1

  • Differentiate citations based on citing papers, journals, authors
  • Defines inter- and intra-graph walks on heterogeneous network
  • Author scores based on their papers
  • Venue scores based on their papers
  • “Random” Jump Vector based on above, run PageRank iteration
  • Repeat until convergence

  • Yan E, Ding Y, Sugimoto CR. P‐Rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the american society for information science and technology. 2011 Mar;62(3):467-77.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

63 of 89

Heterogeneous NetworksApplications

FutureRank1

  • Goal: predict PR scores in future graph
  • Most citations made to papers published 1-2 years prior
    • Hence, recently published papers are more important
    • Use exponential weight for paper age

  1. Sayyadi H, Getoor L. Futurerank: Ranking scientific articles by predicting their future pagerank. InProceedings of the 2009 SIAM International Conference on Data Mining 2009 Apr 30 (pp. 533-544). Society for Industrial and Applied Mathematics.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

64 of 89

Heterogeneous NetworksApplications

FutureRank1

  • Goal: predict PR scores in future graph
  • “Good research is done by good researchers”
  • Network of papers and authors - mutual reinforcement between them
  • M: authorship matrix, M[i,j]=1 iff paper j written by author i, else 0

  • Sayyadi H, Getoor L. Futurerank: Ranking scientific articles by predicting their future pagerank. InProceedings of the 2009 SIAM International Conference on Data Mining 2009 Apr 30 (pp. 533-544). Society for Industrial and Applied Mathematics.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

65 of 89

Heterogeneous NetworksApplications

FutureRank1

  • Goal: predict PR scores in future graph
  • “Good research is done by good researchers”
  • Repeat until convergence

  • Sayyadi H, Getoor L. Futurerank: Ranking scientific articles by predicting their future pagerank. InProceedings of the 2009 SIAM International Conference on Data Mining 2009 Apr 30 (pp. 533-544). Society for Industrial and Applied Mathematics.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

66 of 89

Heterogeneous NetworksApplications

FutureRank1

  • Goal: predict PR scores in future graph
  • “Good research is done by good researchers”
  • Repeat until convergence

  • Sayyadi H, Getoor L. Futurerank: Ranking scientific articles by predicting their future pagerank. InProceedings of the 2009 SIAM International Conference on Data Mining 2009 Apr 30 (pp. 533-544). Society for Industrial and Applied Mathematics.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

67 of 89

Heterogeneous NetworksApplications

FutureRank1

  • Goal: predict PR scores in future graph
  • “Good research is done by good researchers”
  • Repeat until convergence

  • Sayyadi H, Getoor L. Futurerank: Ranking scientific articles by predicting their future pagerank. InProceedings of the 2009 SIAM International Conference on Data Mining 2009 Apr 30 (pp. 533-544). Society for Industrial and Applied Mathematics.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

68 of 89

Heterogeneous NetworksApplications

FutureRank1

  • Goal: predict PR scores in future graph
  • “Good research is done by good researchers”
  • Repeat until convergence

  • Sayyadi H, Getoor L. Futurerank: Ranking scientific articles by predicting their future pagerank. InProceedings of the 2009 SIAM International Conference on Data Mining 2009 Apr 30 (pp. 533-544). Society for Industrial and Applied Mathematics.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

69 of 89

Heterogeneous NetworksApplications

FutureRank1

  • Goal: predict PR scores in future graph
  • “Good research is done by good researchers”
  • Repeat until convergence

  • Sayyadi H, Getoor L. Futurerank: Ranking scientific articles by predicting their future pagerank. InProceedings of the 2009 SIAM International Conference on Data Mining 2009 Apr 30 (pp. 533-544). Society for Industrial and Applied Mathematics.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

70 of 89

Classification IIComputational Model

Citation Count

PageRank

Heterogeneous Networks

Ensemble Methods

  • Calculate any number of different scores based on the above
  • Combine them through some operator
  • Most methods in KDD’ cup 2016

Other Approaches

CC-based Ranking

Author- based Ranking

Venue- based Ranking

PR-based Ranking

Other Ranking

Final Paper Ranking

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

71 of 89

Ensemble MethodsApplications

WSDM cup 2016 winner1

  • Multiple bipartite graphs
  • Initialize: linear combination of citations and references
  • Propagate paper scores
    • Papers <= avg score of citing papers
    • Authors <= avg score of their papers
    • Venues <= avg score of their papers
  • Refine author scores
    • Avg of previous step score based on the venues they publish in
  • Αpply voting strategy
    • Avg of initial score and “dominant group” avg
  • Repeat ~ 5 times
  1. Feng MH, Chan K, Chen HY, Tsai MF, Yeh MY, Lin SD. An efficient solution to reinforce paper ranking using author/venue/citation information-the winner’s solution for wsdm cup 2016. WSDM Cup. 2016.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

72 of 89

Ensemble MethodsApplications

WSDM cup 2016 winner1

  • Multiple bipartite graphs
  • Initialize: linear combination of citations and references
  • Propagate paper scores
    • Papers <= avg score of citing papers
    • Authors <= avg score of their papers
    • Venues <= avg score of their papers
  • Refine author scores
    • Avg of previous step score based on the venues they publish in
  • Αpply voting strategy
    • Avg of initial score and “dominant group” avg
  • Repeat ~ 5 times
  • Feng MH, Chan K, Chen HY, Tsai MF, Yeh MY, Lin SD. An efficient solution to reinforce paper ranking using author/venue/citation information-the winner’s solution for wsdm cup 2016. WSDM Cup. 2016.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

73 of 89

Ensemble MethodsApplications

WSDM cup 2016 winner1

  • Multiple bipartite graphs
  • Initialize: linear combination of citations and references
  • Propagate paper scores
    • Papers <= avg score of citing papers
    • Authors <= avg score of their papers
    • Venues <= avg score of their papers
  • Refine author scores
    • Avg of previous step score based on the venues they publish in
  • Αpply voting strategy
    • Avg of initial score and “dominant group” avg
  • Repeat ~ 5 times
  • Feng MH, Chan K, Chen HY, Tsai MF, Yeh MY, Lin SD. An efficient solution to reinforce paper ranking using author/venue/citation information-the winner’s solution for wsdm cup 2016. WSDM Cup. 2016.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

74 of 89

Ensemble MethodsApplications

WSDM cup 2016 winner1

  • Multiple bipartite graphs
  • Initialize: linear combination of citations and references
  • Propagate paper scores
    • Papers <= avg score of citing papers
    • Authors <= avg score of their papers
    • Venues <= avg score of their papers
  • Refine author scores
    • Avg of previous step score based on the venues they publish in
  • Αpply voting strategy
    • Avg of initial score and “dominant group” avg
  • Repeat ~ 5 times
  • Feng MH, Chan K, Chen HY, Tsai MF, Yeh MY, Lin SD. An efficient solution to reinforce paper ranking using author/venue/citation information-the winner’s solution for wsdm cup 2016. WSDM Cup. 2016.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

75 of 89

Ensemble MethodsApplications

WSDM cup 2016 winner1

  • Multiple bipartite graphs
  • Initialize: linear combination of citations and references
  • Propagate paper scores
    • Papers <= avg score of citing papers
    • Authors <= avg score of their papers
    • Venues <= avg score of their papers
  • Refine author scores
    • Avg of previous step score based on the venues they publish in
  • Αpply voting strategy
    • Avg of initial score and “dominant group” avg
  • Repeat ~ 5 times
  • Feng MH, Chan K, Chen HY, Tsai MF, Yeh MY, Lin SD. An efficient solution to reinforce paper ranking using author/venue/citation information-the winner’s solution for wsdm cup 2016. WSDM Cup. 2016.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

76 of 89

Ensemble MethodsApplications

WSDM cup 2016 winner1

  • Multiple bipartite graphs
  • Initialize: linear combination of citations and references
  • Propagate paper scores
    • Papers <= avg score of citing papers
    • Authors <= avg score of their papers
    • Venues <= avg score of their papers
  • Refine author scores
    • Avg of previous step score based on the venues they publish in
  • Αpply voting strategy
    • Avg of initial score and “dominant group” avg
  • Repeat ~ 5 times
  • Feng MH, Chan K, Chen HY, Tsai MF, Yeh MY, Lin SD. An efficient solution to reinforce paper ranking using author/venue/citation information-the winner’s solution for wsdm cup 2016. WSDM Cup. 2016.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

77 of 89

Ensemble MethodsApplications

WSDM cup 2016 winner1

  • Multiple bipartite graphs
  • Initialize: linear combination of citations and references
  • Propagate paper scores
    • Papers <= avg score of citing papers
    • Authors <= avg score of their papers
    • Venues <= avg score of their papers
  • Refine author scores
    • Avg of previous step score based on the venues they publish in
  • Αpply voting strategy
    • Avg of initial score and “dominant group” avg
  • Repeat ~ 5 times
  • Feng MH, Chan K, Chen HY, Tsai MF, Yeh MY, Lin SD. An efficient solution to reinforce paper ranking using author/venue/citation information-the winner’s solution for wsdm cup 2016. WSDM Cup. 2016.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

78 of 89

Classification Axis II: underlying computational model

Citation Count

PageRank

Heterogeneous Networks

Ensemble Methods

Other Approaches

  • Approaches not fitting the above
    • E.g., rescaling PageRank scores
    • using lengths of shortest citation paths
    • others

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

79 of 89

Other Approaches�Example methods

Age-Rescaled PageRank1

  • Goal: debias age distribution of highly ranked papers
  • Recalculate PageRank scores based on other recently published papers

  • Use papers j ∈ [i − ∆p/2, i + ∆p/2] to calculate avg and std dev
    • R(pi) < 0, underperforms
    • R(pi) < 0, overperforms
  • Extension: field- & age-rescaled2
  1. Mariani MS, Medo M, Zhang YC. Identification of milestone papers through time-balanced network centrality. Journal of Informetrics. 2016 Nov 1;10(4):1207-23.
  2. Vaccario G, Medo M, Wider N, Mariani MS. Quantifying and suppressing ranking bias in a large citation network. Journal of informetrics. 2017 Aug 1;11(3):766-82.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

80 of 89

Other Approaches�Example methods

Age-Rescaled PageRank1

  • Goal: debias age distribution of highly ranked papers
  • Recalculate PageRank scores based on other recently published papers

  • Use papers j ∈ [i − ∆p/2, i + ∆p/2] to calculate avg and std dev
    • R(pi) < 0, underperforms
    • R(pi) > 0, overperforms
  • Extension: field- & age-rescaled2
  • Mariani MS, Medo M, Zhang YC. Identification of milestone papers through time-balanced network centrality. Journal of Informetrics. 2016 Nov 1;10(4):1207-23.
  • Vaccario G, Medo M, Wider N, Mariani MS. Quantifying and suppressing ranking bias in a large citation network. Journal of informetrics. 2017 Aug 1;11(3):766-82.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

81 of 89

Strengths and WeaknessesGeneral

Semantics

  • PageRank-based models translate to researcher behaviour
    • Easier to understand
    • PageRank-based scores describe % of time spent on each paper or probability of reaching a paper
  • Other methods lack these semantics
    • Some methods tuned based on some ground truth w/o providing any explainable semantics

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

82 of 89

Strengths and WeaknessesGeneral

Semantics

  • PageRank-based models translate to researcher behaviour
    • Easier to understand
    • PageRank-based scores describe % of time spent on each paper or probability of reaching a paper
  • Other methods lack these semantics
    • Some methods tuned based on some ground truth w/o providing any explainable semantics

Data usability

  • Metadata-based approaches suffer from
    • Lesser availability
    • Data Cleaning issues

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

83 of 89

Strengths and WeaknessesPopularity vs Influence

Time bias is inherent in Citation Count and PageRank

Some works place importance on “predicting” rankings based on future citation counts or PageRank

We examined effectiveness of different types of methods on this task

  • Split dataset on time point ts
  • Rank papers based on examined method based on citation network up to ts
  • Compare ranking to
    • Future citation counts not counting old citations (Popularity)
    • Future citation counts considering all citations (Influence)

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

84 of 89

Strengths and WeaknessesPopularity

Effectiveness on Popularity1

  • Measure correlation of rankings to future citation counts (FCC)
  • Time-aware methods perform best
    • Citation age most effective
    • Citation age cannot capture cold start papers
    • Paper age cannot differentiate papers of same age
    • Citation gap not as effective
  • Metadata not effective
  • Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Impact-based ranking of scientific publications: a survey and experimental evaluation. IEEE Transactions on Knowledge and Data Engineering. 2019 Sep 13;33(4):1567-84.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

85 of 89

Strengths and WeaknessesInfluence

Effectiveness on Influence1

  • Measure correlation of rankings to overall PageRank - including future references (TPR)
  • Traditional, time-independent methods are effective
  • No particular benefit of ensemble / metadata-based methods
  • Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Impact-based ranking of scientific publications: a survey and experimental evaluation. IEEE Transactions on Knowledge and Data Engineering. 2019 Sep 13;33(4):1567-84.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

86 of 89

Further Reading

  1. Langville AN, Meyer CD. Google's PageRank and beyond. Princeton university press; 2011 Jul 1.
  2. Chen P, Xie H, Maslov S, Redner S. Finding scientific gems with Google’s PageRank algorithm. Journal of Informetrics. 2007 Jan 1;1(1):8-15.
  3. Ma N, Guan J, Zhao Y. Bringing PageRank to the citation analysis. Information Processing & Management. 2008 Mar 1;44(2):800-10.
  4. Hwang WS, Chae SM, Kim SW, Woo G. Yet another paper ranking algorithm advocating recent publications. InProceedings of the 19th international conference on World wide web 2010 Apr 26 (pp. 1117-1118).
  5. Yao L, Wei T, Zeng A, Fan Y, Di Z. Ranking scientific publications: the effect of nonlinearity. Scientific reports. 2014 Oct 17;4(1):1-6.
  6. Zhou J, Zeng A, Fan Y, Di Z. Ranking scientific publications with similarity-preferential mechanism. Scientometrics. 2016 Feb;106(2):805-16.
  7. Krapivin M, Marchese M. Focused page rank in scientific papers ranking. InInternational Conference on Asian Digital Libraries 2008 Dec 2 (pp. 144-153). Springer, Berlin, Heidelberg.
  8. Su C, Pan Y, Zhen Y, Ma Z, Yuan J, Guo H, Yu Z, Ma C, Wu Y. PrestigeRank: A new evaluation method for papers and journals. Journal of Informetrics. 2011 Jan 1;5(1):1-3.
  9. Yan E, Ding Y. Weighted citation: An indicator of an article's prestige. Journal of the American Society for Information Science and Technology. 2010 Aug;61(8):1635-43.
  10. Ghosh R, Kuo TT, Hsu CN, Lin SD, Lerman K. Time-aware ranking in dynamic citation networks. In2011 ieee 11th international conference on data mining workshops 2011 Dec 11 (pp. 373-380). IEEE.
  11. Yu PS, Li X, Liu B. Adding the temporal dimension to search-a case study in publication search. InThe 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05) 2005 Sep 19 (pp. 543-549). IEEE.
  12. Wade AD, Wang K, Sun Y, Gulli A. Wsdm cup 2016: Entity ranking challenge. InProceedings of the ninth ACM international conference on web search and data mining 2016 Feb 8 (pp. 593-594).
  13. Walker D, Xie H, Yan KK, Maslov S. Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment. 2007 Jun 14;2007(06):P06010.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

87 of 89

Further Reading

  • Sayyadi H, Getoor L. Futurerank: Ranking scientific articles by predicting their future pagerank. InProceedings of the 2009 SIAM International Conference on Data Mining 2009 Apr 30 (pp. 533-544). Society for Industrial and Applied Mathematics.
  • Zhang F, Wu S. Ranking scientific papers and venues in heterogeneous academic networks by mutual reinforcement. InProceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries 2018 May 23 (pp. 127-130).
  • Yan E, Ding Y, Sugimoto CR. P‐Rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the american society for information science and technology. 2011 Mar;62(3):467-77.
  • Wang Y, Tong Y, Zeng M. Ranking scientific articles by exploiting citations, authors, journals, and time information. InTwenty-seventh AAAI conference on artificial intelligence 2013 Jun 30.
  • Bai X, Xia F, Lee I, Zhang J, Ning Z. Identifying anomalous citations for objective evaluation of scholarly article impact. PloS one. 2016 Sep 8;11(9):e0162364.
  • Jiang X, Sun X, Zhuge H. Towards an effective and unbiased ranking of scientific literature through mutual reinforcement. InProceedings of the 21st ACM international conference on Information and knowledge management 2012 Oct 29 (pp. 714-723).
  • Liu Z, Huang H, Wei X, Mao X. Tri-rank: An authority ranking framework in heterogeneous academic networks by mutual reinforce. In2014 IEEE 26th International Conference on Tools with Artificial Intelligence 2014 Nov 10 (pp. 493-500). IEEE.
  • Klosik DF, Bornholdt S. The citation wake of publications detects nobel laureates' papers. PloS one. 2014 Dec 1;9(12):e113184.
  • Mariani MS, Medo M, Zhang YC. Identification of milestone papers through time-balanced network centrality. Journal of Informetrics. 2016 Nov 1;10(4):1207-23.
  • Liao H, Mariani MS, Medo M, Zhang YC, Zhou MY. Ranking in evolving complex networks. Physics Reports. 2017 May 19;689:1-54.

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

88 of 89

Further Reading

Our relevant works

  • Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Ranking papers by their short-term scientific impact. In2021 IEEE 37th International Conference on Data Engineering (ICDE) 2021 Apr 19 (pp. 1997-2002). IEEE.
  • Kanellos I, Vergoulis T, Sacharidis D. Ranking Papers by Expected Short-Term Impact. In Predicting the Dynamics of Research Impact 2021 (pp. 89-121). Springer, Cham.
  • Chatzopoulos S, Vergoulis T, Kanellos I, Dalamagas T, Tryfonopoulos C. Artsim: improved estimation of current impact for recent articles. In Adbis, tpdl and eda 2020 common workshops and doctoral consortium 2020 Aug 25 (pp. 323-334). Springer, Cham.
  • Chatzopoulos S, Vergoulis T, Kanellos I, Dalamagas T, Tryfonopoulos C. Further improvements on estimating the popularity of recently published papers. Quantitative Science Studies. 2021:1-36.
  • Kanellos I, Vergoulis T, Sacharidis D, Dalamagas T, Vassiliou Y. Impact-based ranking of scientific publications: a survey and experimental evaluation. IEEE Transactions on Knowledge and Data Engineering. 2019 Sep 13;33(4):1567-84.
  • Vergoulis T, Chatzopoulos S, Kanellos I, Deligiannis P, Tryfonopoulos C, Dalamagas T. Bip! finder: Facilitating scientific literature search by exploiting impact-based ranking. InProceedings of the 28th ACM International Conference on Information and Knowledge Management 2019 Nov 3 (pp. 2937-2940).

TUTORIAL: Assessing Research Impact by Leveraging Open Scholarly Knowledge Graphs

89 of 89

Thank you!

Ilias Kanellos - ilias.kanellos@athenarc.gr

Dimitris Sacharidis - dimitris.sacharidis@ulb.be - @dsachar

Thanasis Vergoulis - vergoulis@athenarc.gr - @vergoulis