2 of 43

How important is a node in a network?

https://oracleofbacon.org/

We can measure nodes importance using so-called centrality.

Bad term: �nothing to do with being central in general

Usage:

Some centralities have straightforward interpretation
Centralities can be used as node features for machine learning on graph

3 of 43

Where are you?

It is always possible, once fixed a context, to measure our distance from a “focal” node.

For instance:

Movie Stars:

Bacon number

Researchers:

Erdos number

Are such “focal” nodes really different from the others?

4 of 43

Degree Centrality

k = number of links

1 if i and j are connected,

A_i,j � 0 otherwise

How many neighbors does a node have?

Often enough to find important nodes

(e.g., main characters of a tv series talk with more people)

But not always

Twitter users with the most contacts are spam
Webpages/wikipedia pages with most links are often lists of references

k(g) = 1

k(a) = 4

k(b)=2

Degree Example

5 of 43

Connectivity-based centralities

“influence based on the number of links a node has to other nodes in the network”

6 of 43

Recursive definitions

Recursive importance:

Important nodes are those connected to important nodes

Several centralities based on this idea:

Eigenvector centrality
PageRank
Katz centrality
...

Idea:

Each node has a score (centrality)
If every node “sends” its score to its neighbors, the sum of all scores received by each node will be equal to its original score

x_i is the centrality of node i

How to solve it (power method):

Initialize scores to random values
Update the values according to the desired rule

Does it converge?

Yes, if the graph is undirected with a single connected component (Perron-Frobenius theorem)

7 of 43

Eigenvector Centrality

A pair of eigenvector (x) and eigenvalue (λ) is defined by the relation:

Ax = λx

x is a vector of size N that can be interpreted as the nodes scores
Ax yield a new vector of the same size which corresponds for each node to the sum of the received scores from its neighbors
the equality implies that the new scores are proportional to the previous ones

Problems:

In case of DiGraphs:

Adjacency matrix is asymmetric
2 sets of eigenvectors
2 leading eigenvectors (use the incoming ones)

In presences of source nodes (0 in-degree):

E(g) = 0
E(d) = 0 as well since its incoming link comes from A
…

Eigenvector Example�

8 of 43

PageRank

Main idea: The PageRank computation can be interpreted as a Random Walk process with restart

Probability that the RW will be in node i next step depends only on the current node j and the transition probability� j ➝ i determined by the stochastic matrix

Consequently this is a first-order Markov process
Stationary probabilities (i.e., when walk length tends towards ∞ ) of the RW to be in node i gives the PageRank of the node

PR(a) = PR(b)/4 + PR(c)/3 + PR(e)/2

PageRank Example

Teleportation probability: the parameter α gives the probability that in the next step of the RW will follow a Markov process or with probability 1-α it will jump to a random node

α<1, it assures that the RW will never be stuck at nodes with k_out = 0, but it can restart the RW from a randomly selected other node (usually α=0.85)

9 of 43

PageRank (cont’d)

where R is the solution of the equation

PageRank can also be interpreted as the dominant eigenvector of the normalized adjacency matrix

Two improvements w.r.t. eigenvector centrality:

Add a constant centrality gain for every node (solves source node problem in digraphs)

Nodes with very high centralities give high centralities to their neighbors

is the ratio between number of links outbound from page j to page i to the total number of outbound links of page j

10 of 43

Katz �Centrality

Measuring the relative degree of influence of a node within a network

CK(i):

for all distances k, for all nodes j, sum the number of different paths i→j of length k (multiplied by an attenuation factor α)

NB: Attenuation factor α must be smaller than 1/|λ| , i.e. the reciprocal of the absolute value of the largest eigenvalue of A.

CK(g) = 0.26

CK(b) = 0.29

Katz Example�(non normalized)

11 of 43

Geometric Centralities

”importance of a node depends on some function of its distances w.r.t. other nodes”

12 of 43

Closeness Centrality

Farness: average of length of shortest paths to all other nodes

Closeness: inverse of the Farness �(normalized by number of nodes)

Highest closeness = More central
Closeness=1: directly connected to all other nodes
Well defined only on connected networks

Closeness Formula

C(g) = 1/10 (1+2*3+2*3+4+3*5) � = 3.2

C(a) = 1/10 (4+2*3+3*3) � = 1.9

C(b) = 1/10 (2+2*6+2*3) � = 2

Farness Example

13 of 43

Harmonic Centrality

Harmonic mean of the geodesic �(shorted paths) distances from a given node �to all others.

CH(g) = 4

CH(b ) =5.6

Harmonic Example�

In case of no paths between two nodes i and j d_ij=∞
Well defined for disconnected graphs

14 of 43

Betweenness Centrality

Number of shortest paths that go through �a node.

Assumption: important vertices are bridges over which information flows

Practically: if information spreads via shortest paths, important nodes are found on many shortest paths

Cb(g) = 0

Cb(a) = 5*5 +4 = 29

Cb(b) = 4*6 = 24

Betweenness Example�(non normalized)

Cb(d) = 9+7/2 = 12.5

Definition

Normalized def.

15 of 43

Katz

What’s your degree of influence?

PageRank

How many important interactions do you have?

Eigenvector

Are you connected to important nodes?

Betweenness

How much do you help the network to stay connected?

Harmonic

What’s your harmonic average distance w.r.t. the rest of the network?

Closeness

What’s your average distance w.r.t. the rest of the network?

Degree

How many friends do you have?

Connectivity-based centralities

Geometric� centralities

Each centrality measures is a proxy of an underlying network process.

If such a process is irrelevant for the network than the centrality measure makes no sense

E.g. If information does not spread through shortest paths, betweenness centrality is irrelevant

Centrality measures should be used with caution (a) for exploratory purposes and (b) for characterisation

16 of 43

Understanding Centralities

Data and Viz @mathbeveridge�www.networkofthrones.wordpress.com

17 of 43

Node Label: PageRank

Node Size: Betweenness Centrality

Edge Size: #interactions

Colors: Community (with Louvain)

GOT: S1

18 of 43

GOT: S7

Node Label: PageRank

Node Size: Betweenness Centrality

Edge Size: #interactions

Colors: Community (with Louvain)

19 of 43

All characters interactions �(up to the last season… data are coming)

Node Label: PageRank

Node Size: Betweenness Centrality

Edge Size: #interactions

Colors: Community (with Louvain)

More on: www.networkofthrones.wordpress.com

GOT: S1-7

20 of 43

Season 1: Ned

Season 2: Tyrion

Season 3: Rob

Season 4: Tyrion

Season 5: Cersei

Season 6: Sansa

Season 7: Jon

Season 8: Tyrion

21 of 43

Do Birds of a Feather Flock Together?

Homophilic behaviors in complex networks

22 of 43

Homophily

Property of (social) networks that nodes of the same attitude tends to be connected with

a higher probability than expected

It appears as correlation between vertex properties of x(i) and x(j) if (i,j) ∈ E

Disassortative mixing:

Contrary of homophily: dissimilar nodes tend to be connected �(e.g., sexual networks, predator-prey)

Examples of Vertex properties

age, gender, nationality,�political beliefs, excitatory or inhibitory neuron

Homophily can be a link creation mechanism or consequence of influence (and it is difficult to distinguish)

23 of 43

Assortative Mixing�(Newman’s assortativity)

Quantify homophily while discrete (categoric) node properties are involved

where:

e_ij fraction of links connecting nodes of type i and j
a_i fraction of out-links from nodes of type a
b_i fraction of in-links for type b nodes

Interpretation

r=0: no assortative mixing
r=1: perfectly assortative
-1<r<0: disassortative mixing

0.621

Newman, Mark EJ. "Mixing patterns in networks."

Physical Review E 67.2 (2003): 026126.

24 of 43

Assortative Mixing�Newman’s assortativity - Example

Compute Newman’s assortativity given the network below:

Newman, Mark EJ. "Mixing patterns in networks."

Physical Review E 67.2 (2003): 026126.

Assortativity Matrix

Assortativity Index

In undirected network a_i= b_i

25 of 43

Assortative Mixing�(Newman’s assortativity)

Quantify homophily while scalar node properties are involved (e.g., degree assortativity)

where:

e_xy fraction of links connecting nodes with values x and y
a_x fraction of out-links from nodes having value x
b_y fraction of in-links for nodes having value y
σ standard deviations

Degree disassortative

Nodes tends to connect in a star-like topology

Newman, Mark EJ. "Mixing patterns in networks."

Physical Review E 67.2 (2003): 026126.

26 of 43

Assortativity by degree w.r.t. different network types

27 of 43

Degree correlation

(insights)

Using a single number: Newman’s assortativity

Plotting k_nn(k) in function of k,

where

is a degree correlation function, e.g., the average degree of the neighbors of all degree-k nodes.

28 of 43

Clumpiness

Similar (scalar) characteristics are more prominent along short distances

e.g., high-degree nodes are separated by few links

where:

k_i the degree of node i
α > 0 a real parameter
d_ij the distance between nodes i and j

Clumped assortative

Λ increases with the increase of the degrees of the nodes in the network

Clumped disassortative

Λ decreases with the increase in the separation between the degrees of the nodes in the network

Estrada, N. Hatano, A. Gutierrez, “Clumpiness mixing in complex networks”, Journal of Statistical Mechanics: Theory and Experiment.

29 of 43

Examples and Case Study

30 of 43

Case study: Happiness

James H. Fowler, Nicholas A. Christakis.

Dynamic Spread of Happiness in a Large Social Network:

Longitudinal Analysis Over 20 Years in the Framingham Heart Study

British Medical Journal 337 (4 December 2008)

31 of 43

Is a Global Measure enough?

“Sure I can work with the means, but I'd rather party with the outliers...”

32 of 43

Limits of a global assortativity score

Local assortativity of gender in a sample of Facebook friendships (McAuley and Leskovec 2012).

Different regions of the graph exhibit strikingly different patterns, suggesting that a single variable, e.g. global assortativity (Newman’s), would provide a poor description of the system.

33 of 43

Moving toward a multiscale approach to measure assortativity

Five networks (top) of n=40 nodes and m=160 edges with the same global assortativity r=0

34 of 43

Multiscale Mixing Patterns

Idea:

A local measure that captures the mixing patterns within the local neighbourhood of a given node.

Trivial solution:

Consider only the node’s neighbors

issue with sample size �(what about low degree nodes?)

Better approximation:

Considering the stable state of a RW (probability to reach a given node) �to weight the edges

Issue:

Need to fix the value of α

α=0 the RW stays put,
α=1 the RW never restarts

Solution:

Integrate over all possible α (multiscale approach)

Peel, Leto, Jean-Charles Delvenne, and Renaud Lambiotte. "Multiscale mixing patterns in networks." Proceedings of the National Academy of Sciences 115.16 (2018): 4057-4062.

35 of 43

Peel’s Quintet: Only A has a random-like assortment distribution

36 of 43

Facebook100�Distribution of local assortativity for the “dorm” node feature

Evaluation on real data

37 of 43

Facebook100�Correlation of local assortativities by dorm and matriculation year (x axis) �and proportion of nodes which are more assortative by dorm than by year (y axis).

Evaluation on real data (cont’d)

38 of 43

Distance matter:�Conformity

Idea:

Random walkers flatten reachability distances: can we weight local assortativity considering shortest paths?

Solution:

Clumpiness like modeling:

the closer similar nodes are, the higher their local “assortativity”

Strengths:

Higher informative power w.r.t. Multiscale-Mixing
Solves the “range” problem�(the full interval [-1,1] is covered)
Tunable distance dumping factor
Allows for multi label profiles

Weaknesses:

Costly computation

requires the extraction of all shortest paths

similarity

distance dumping

G. Rossetti, S. Citraro and L. Milli, "Conformity: a Path-Aware Homophily measure for Node-Attributed Networks" in IEEE Intelligent Systems, vol. , no. 01, pp. 1-1, 5555. 2021

39 of 43

Node colors map categorical attribute values, while node sizes encode the respective

Conformity scores (the smaller the size, the lower the score)

Conformity: Intuition

40 of 43

Peel’s Quintet: Only A has a random-like assortment distribution

41 of 43

Facebook100�Distribution of local assortativity for the “gender” node feature

Evaluation on real data

42 of 43

GoT Season 6�Node attributes: Houses. Global attribute assortativity: 0.2�Multiscale Mixing (left), Conformity (right)�

Multiscale Mixing Vs. Conformity

43 of 43

Chapter 6 �Conclusion

Take Away Messages

Nodes positions play an important role in network topology
Different centralities allows to capture, valuable, information
In social contexts individuals tend to cluster following homophilic patterns

What’s Next

Chapter 7:�Tie Strength & Resilience��