1 of 40

Chapter 11 �Link Prediction

Summary

Predicting Network Evolution
Unsupervised approaches
Supervised approaches
Evaluation

Reading

Liben-Nowel & Kleinberg paper

2 of 40

Link Prediction

Goal

Understanding how networks evolve

Problem definition

Given a snapshot of a network at time t, (accurately) predict the edges that will appear in the network during the interval (t, t+1)

Liben‐Nowell, David, and Jon Kleinberg. "The link‐prediction problem for social networks."

Journal of the American society for information science and technology 58.7 (2007): 1019-1031.

3 of 40

Examples of uses of

Link Prediction

Monitor terrorist networks – deducing possible interactions/missing links between terrorists (without direct evidence)

Suggest interactions or collaborations that haven’t yet been exploited within an organization

Friendship prediction (i.e., Facebook)

4 of 40

Link Prediction

Link prediction is used to predict future possible links in the network (e.g., Facebook).

Or, it can be used to predict missing links due to incomplete data (e.g., Food-webs)

RESEARCH ARTICLE

Link Prediction in Criminal Networks: A Tool for Criminal Intelligence Analysis

Giulia Berlusconi1, Francesco Calderoni1*, Nicola Parolini2, Marco Verani2,�Carlo Piccardi3*

1 Università Cattolica del Sacro Cuore and Transcrime, Milano, Italy, 2 MOX, Department of Mathematics, Politecnico di Milano, Milano, Italy, 3 Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italy

* francesco.calderoni@unicatt.it (FC); carlo.piccardi@polimi.it (CP)

5 of 40

Given a graph G = (V,E) the set of possible edges to be predicted is O(|V|^2);

Real networks tend to be sparse

False Positive prediction issue�(i.e., we are likely to predict edges that will never appear)

Link Prediction

Task Complexity

6 of 40

Scientists who are close in the network �(i.e., have common colleagues)

→ will likely collaborate in the future

Concretizing an Intuition…

Goal:

make this intuitive notion precise and understand which measures of “proximity” leads to accurate predictions

7 of 40

Consider as input a graph G at time t
Consider all the possible pairs of nodes (u,v)
Compute a link formation scores:� score(u,v)
Build a list of all possible edges ordered by scores (from highest to lowest)
Verify, following that ordering, the predictions on the graph at time t+1

score is a measure of proximity

Link Prediction

Workflow

8 of 40

Link Prediction

Unsupervised

Supervised

Define a set of proximity measures unrelated to the particular network

Extract knowledge from the network in order to improve prediction accuracy

Approaches

9 of 40

Unsupervised Link Prediction

10 of 40

Unsupervised Link Prediction

Unsupervised measurements rely on different structural properties of networks

Neighborhood measures

Common Neighbors, Adamic Adar, Jaccard, Preferential Attachment�

Path-based measures

Graph distance, Katz �

Ranking

Sim Rank, Hitting time, Page Rank

Liben‐Nowell, David, and Jon Kleinberg. "The link‐prediction problem for social networks." Journal of the American society for information science and technology 58.7 (2007): 1019-1031.

11 of 40

How many friends we have to share in order to become friends?

Common Neighbors:�the more friends we share, �the more likely we will become friends

Jaccard:�the more similar our friends circles are, �the more likely we will become friends

Unsupervised Link Prediction

Neighborhood measures

12 of 40

How many friends we have to share in order to become friends?

Adamic Adar:�the more selective our mutual friends are, �the more likely we will become friends

Preferential Attachment:�the more friends we have, �the more likely we will become friends

Unsupervised Link Prediction

Neighborhood measures

13 of 40

How distant are we?

Graph Distance:�(negated) length of the shortest path between two nodes

Katz:�weighted sum over all the paths between two nodes

Unsupervised Link Prediction

Path-based measures

14 of 40

How similar are we?

SimRank:�two nodes are similar to the extent that their neighborhoods are similar

Unsupervised Link Prediction

Ranking

15 of 40

Unsupervised Link Prediction

Limits

Different kinds of networks are described �by the same general closed formula
An average overall performance �between 6% and 12%

Measure comparison

No single winner
Almost all predictors outperform the random predictor �⇒ there is useful information in network topology

16 of 40

Supervised Link Prediction

17 of 40

Supervised Link Prediction

The process is now organized in 4 steps:

Split the data in train/test
Learning a model on the train
Use the model for prediction
Compare the prediction with the test

A natural way to do it: �build a “classifier” over a set of network features.

18 of 40

Learn a Classifier (i.e., a Decision Tree) over unsupervised LP scores to generalize the assumption they made on the network growth model

Staking Unsupervised Scores

19 of 40

Supervised Link Prediction

Frequent Pattern

Mining

GERM

Evolution rules can be extracted from the network history and used to identify/predict recurrent patterns

e.g., generalization of triadic closure

Berlingerio, Michele, et al. "Mining graph evolution rules." joint European conference on machine learning and knowledge discovery in databases (2009).

20 of 40

Supervised Link Prediction

Network Embedding

Idea

Graphs can be mapped into vector spaces

Node/edge similarity scores can be used to define metric spaces
Metric spaces enable a more natural application of approaches from DM/ML

NB: Different “mappings” facilitate the solution of different classes of problems

21 of 40

Supervised Link Prediction

Limits

No Free Lunch
Model construction is often complex and, usually, more time/resource demanding than directly applying unsupervised scores.

Embedding is not The Answer, �only a different way to reason on graphs...

Results: Higher performances w.r.t. unsupervised approaches

22 of 40

Given a predictor p is there a way to decide if it is a “good” one?

First Step:

verify that p outperforms the random predictor. �

Random Predictor

each edge has the same probability to

appear in the network

if ratio > 1 then p is meaningful

Evaluation

23 of 40

Evaluation: Comparing Predictors

We need to analyze either the performances ratio, ROC and/or Precision Recall curve.

24 of 40

Precision Vs. Recall

Precision: PPV = TP/(TP+FP)
Recall: TPR = TP/(TP+FN)

ROC (Receiver operating characteristic)

1-Specificity: FPR = FP/(FP+TN)
Recall: TPR = TP/(TP+FN)

Note:

ROC and PR spaces are isomorphic�(the use of ROC is more widespread)
Numerical comparison can be done using �the AUROC (area under the ROC curve)

Evaluation: ROC and PR curve

25 of 40

Accuracy could be improved extending simple models with more complex (even semantic) informations:

Link strength
Geographical information
...

Link Prediction needs to be revised while in some scenarios:

Dynamic Networks
Multiplex networks
...

Link Prediction

Something more...

26 of 40

Key Messages

Predict new links that will arise in a network is not easy:

Networks are, usually, sparse
Cold Start Problem

What if I don’t have enough information?

Can I predict bridges?�

False Positive prediction

Bridges !?!�

Simple approaches are “too simple”
Complex approaches are costly

27 of 40

Case Study:

Interaction Prediction in Dynamic Networks �

28 of 40

Case Study�Interaction Prediction

Link Prediction goal:

Predict ties that are not present in actual network configuration.

Ties are persistent structures that once appeared cannot disappear (i.e., friendship...)

Interaction Prediction goal:

Predict interactions that will occur (either for the first time or not) among nodes already observed in the network.

Interactions are volatile structures that can occur multiple times and whose value can vanish as time goes by (i.e., telephone calls...)

Rossetti et al. "Interaction prediction in dynamic networks exploiting community discovery." IEEE/ACM ASONAM, 2015.

29 of 40

Case Study�Interaction Prediction

Given a set G = {G₀, . . . G_t, . . . G_τ} of ordered network observations, with t ∈ T = {0...τ}, the interaction prediction problem aims to predict new interactions that will took place at time τ + 1 thus composing G_τ+1.

Idea:

Model network evolution through temporal snapshots;
False Positive reduction: �Community Discovery as a bound for strong ties;
Time-Aware approach: �time series forecast of topological measures;
Supervised Approach: �ensemble of classifiers learnt on the topological features, tested on the forecasted ones.

30 of 40

Case Study�Interaction Prediction

Step 1: �For each temporal snapshot t ∈ T compute a partition �C_t= {C_t,0, . . . , C_t,k} of G_tusing a community discovery algorithm.

Step 2: �For each t ∈ T compute a set of measures F for each nodes pair (u,v) belonging to at least a community in C_t

Step 3: �For each node pair (u, v) and feature f ∈ F build a time series S^u,vand apply a forecasting techniques in order to obtain its future expected value f^u,v

Step 4: �Use the set of expected values f^u,vto predict future intra-community interactions.

31 of 40

Case Study�Interaction Prediction

Step 1: Community Discovery (CD)

Each CD algorithm proposes its own Community Definition.

Demon �(ego-network based, overlap)
Louvain �(modularity, crisp partition)
Infomap �(conductance, crisp partition)

32 of 40

Case Study�Interaction Prediction

Step 2: Features

On the identified communities we compute three set of features:

Pairwise Structural Features�(i.e., Jaccard, CN, Adamic/Adar…)
Node Topology Features�(PageRank, edge betweenness…)
Community Features�(i.e., density, size, shared communities, avg. clustering…)

33 of 40

Case Study�Interaction Prediction

Step 3: Time Series Forecast

For each time series we apply several forecasting model in order to extract the expected future value.

34 of 40

Case Study�Interaction Prediction

Step 4: Classification

Once learned the features we design two different experiments:

Balanced Scenario�The positive and negative class are balanced through downsampling to design a standard baseline

Unbalanced Scenario�The data positive/negative class ratio is maintained. Due to network sparsity we observe a strong negative prevalence (~98%)

35 of 40

Case Study�Interaction Prediction: Balanced Scenario

Very high accuracy and AUC

CD approaches contribution to IP is topology sensitive

36 of 40

Case Study�Interaction Prediction: Balanced Scenario (cont’d)

Which feature set is the most predictive?

False Positive Filtering (FSF) �vs.�No Filtering (SF)

All Forecast with Filtering �vs. �No Filtering

37 of 40

Case Study�Interaction Prediction: Unbalanced Scenario

Negative class:

Social 95.9%
DBLP 98.9%

Very hard baselines

majority classifiers scores ~.96 and ~.99 precision �(always predicting “no edges”)

the proposed workflow is able to reach ~.96 and ~.45 precision �w.r.t. the positive class

38 of 40

Case Study�Interaction Prediction: What about weak links?

High accuracy is guaranteed by focusing the prediction on intra-community interactions.

Inter-Community Interaction Prediction

Focus on the predicting the presence/absence of at least a new interaction across two communities

no identification of the “real” endpoints
no identification of the multiplicity

Idea

Construct a new network where the meta-nodes are the communities
Apply the same workflow to such graph

39 of 40

Case Study�Interaction Prediction

Even though Interaction prediction is a complex problem it is possible to reach high accuracy through:

Target selection:�False Positive reduction via Community Discovery�Weak interactions treated as “special cases”
Local topology history analysis:�Feature forecast via Time Series analysis

Moreover, each type of datasets demands a specific CD algorithm:

One-to-one interactions (i.e., social ones)
Many-to-many interactions (i.e., co-authorship relations)

40 of 40

Chapter 10 �Conclusion

Take Away Messages

Link wiring patterns define how network topology evolves
Predicting new links is a complex task due to network sparse topologies
Static and dynamic formulation of the problem account for different peculiarities

What’s Next

Chapter 12:�Dynamic Community Discovery��