1 of 43

Chapter 6 Centrality & Assortative Mixing

Summary

  • Measuring Node importance
  • Do birds of a feather flock together?

Reading

  • Chapters 3 & 4 of Kleinberg's book

2 of 43

How important is a node in a network?

We can measure nodes importance using so-called centrality.

Bad term: �nothing to do with being central in general

Usage:

  • Some centralities have straightforward interpretation
  • Centralities can be used as node features for machine learning on graph

3 of 43

Where are you?

It is always possible, once fixed a context, to measure our distance from a “focal” node.

For instance:

Movie Stars:

  • Bacon number

Researchers:

  • Erdos number

Are such “focal” nodes really different from the others?

4 of 43

Degree Centrality

k = number of links

1 if i and j are connected,

Ai,j � 0 otherwise

How many neighbors does a node have?

Often enough to find important nodes

(e.g., main characters of a tv series talk with more people)

But not always

  • Twitter users with the most contacts are spam
  • Webpages/wikipedia pages with most links are often lists of references

g

d

c

e

a

f

b

h

i

k

j

k(g) = 1

k(a) = 4

k(b)=2

Degree Example

5 of 43

Connectivity-based centralities

“influence based on the number of links a node has to other nodes in the network”

6 of 43

Recursive definitions

Recursive importance:

Important nodes are those connected to important nodes

Several centralities based on this idea:

  • Eigenvector centrality
  • PageRank
  • Katz centrality
  • ...

Idea:

  1. Each node has a score (centrality)
  2. If every node “sends” its score to its neighbors, the sum of all scores received by each node will be equal to its original score

xi is the centrality of node i

How to solve it (power method):

  1. Initialize scores to random values
  2. Update the values according to the desired rule

Does it converge?

Yes, if the graph is undirected with a single connected component (Perron-Frobenius theorem)

7 of 43

Eigenvector Centrality

A pair of eigenvector (x) and eigenvalue (λ) is defined by the relation:

Ax = λx

  • x is a vector of size N that can be interpreted as the nodes scores
  • Ax yield a new vector of the same size which corresponds for each node to the sum of the received scores from its neighbors
  • the equality implies that the new scores are proportional to the previous ones

Problems:

In case of DiGraphs:

  • Adjacency matrix is asymmetric
  • 2 sets of eigenvectors
  • 2 leading eigenvectors (use the incoming ones)

In presences of source nodes (0 in-degree):

  • E(g) = 0
  • E(d) = 0 as well since its incoming link comes from A

g

d

c

e

a

f

b

h

i

k

j

Eigenvector Example�

8 of 43

PageRank

Main idea: The PageRank computation can be interpreted as a Random Walk process with restart

Probability that the RW will be in node i next step depends only on the current node j and the transition probability� j ➝ i determined by the stochastic matrix

  • Consequently this is a first-order Markov process
  • Stationary probabilities (i.e., when walk length tends towards ∞ ) of the RW to be in node i gives the PageRank of the node

g

d

c

e

a

f

b

h

i

k

j

PR(a) = PR(b)/4 + PR(c)/3 + PR(e)/2

PageRank Example

Teleportation probability: the parameter α gives the probability that in the next step of the RW will follow a Markov process or with probability 1-α it will jump to a random node

  • α<1, it assures that the RW will never be stuck at nodes with kout = 0, but it can restart the RW from a randomly selected other node (usually α=0.85)

9 of 43

PageRank (cont’d)

where R is the solution of the equation

PageRank can also be interpreted as the dominant eigenvector of the normalized adjacency matrix

Two improvements w.r.t. eigenvector centrality:

  • Add a constant centrality gain for every node (solves source node problem in digraphs)

  • Nodes with very high centralities give high centralities to their neighbors

is the ratio between number of links outbound from page j to page i to the total number of outbound links of page j

10 of 43

Katz �Centrality

Measuring the relative degree of influence of a node within a network

CK(i):

for all distances k, for all nodes j, sum the number of different paths i→j of length k (multiplied by an attenuation factor α)

NB: Attenuation factor α must be smaller than 1/|λ| , i.e. the reciprocal of the absolute value of the largest eigenvalue of A.

g

d

c

e

a

f

b

h

i

k

j

CK(g) = 0.26

CK(b) = 0.29

Katz Example�(non normalized)

11 of 43

Geometric Centralities

”importance of a node depends on some function of its distances w.r.t. other nodes”

12 of 43

Closeness Centrality

Farness: average of length of shortest paths to all other nodes

Closeness: inverse of the Farness �(normalized by number of nodes)

  • Highest closeness = More central
  • Closeness=1: directly connected to all other nodes
  • Well defined only on connected networks

Closeness Formula

g

d

c

e

a

f

b

h

i

k

j

C(g) = 1/10 (1+2*3+2*3+4+3*5) � = 3.2

C(a) = 1/10 (4+2*3+3*3) � = 1.9

C(b) = 1/10 (2+2*6+2*3) � = 2

Farness Example

13 of 43

Harmonic Centrality

Harmonic mean of the geodesic �(shorted paths) distances from a given node �to all others.

g

d

c

e

a

f

b

h

i

k

j

CH(g) = 4

CH(b ) =5.6

Harmonic Example�

  • In case of no paths between two nodes i and j dij=∞
  • Well defined for disconnected graphs

14 of 43

Betweenness Centrality

Number of shortest paths that go through �a node.

  • Assumption: important vertices are bridges over which information flows

  • Practically: if information spreads via shortest paths, important nodes are found on many shortest paths

g

d

c

e

a

f

b

h

i

k

j

Cb(g) = 0

Cb(a) = 5*5 +4 = 29

Cb(b) = 4*6 = 24

Betweenness Example�(non normalized)

Cb(d) = 9+7/2 = 12.5

Definition

Normalized def.

15 of 43

Katz

03

  • What’s your degree of influence?

PageRank

02

  • How many important interactions do you have?

Eigenvector

01

  • Are you connected to important nodes?

Betweenness

06

  • How much do you help the network to stay connected?

Harmonic

05

  • What’s your harmonic average distance w.r.t. the rest of the network?

Closeness

04

  • What’s your average distance w.r.t. the rest of the network?

Degree

00

  • How many friends do you have?

Connectivity-based centralities

Geometric� centralities

Each centrality measures is a proxy of an underlying network process.

If such a process is irrelevant for the network than the centrality measure makes no sense

  • E.g. If information does not spread through shortest paths, betweenness centrality is irrelevant

Centrality measures should be used with caution (a) for exploratory purposes and (b) for characterisation

16 of 43

Understanding Centralities

Data and Viz @mathbeveridge�www.networkofthrones.wordpress.com

17 of 43

Node Label: PageRank

Node Size: Betweenness Centrality

Edge Size: #interactions

Colors: Community (with Louvain)

GOT: S1

18 of 43

GOT: S7

Node Label: PageRank

Node Size: Betweenness Centrality

Edge Size: #interactions

Colors: Community (with Louvain)

19 of 43

All characters interactions �(up to the last season… data are coming)

Node Label: PageRank

Node Size: Betweenness Centrality

Edge Size: #interactions

Colors: Community (with Louvain)

GOT: S1-7

20 of 43

Season 1: Ned

Season 2: Tyrion

Season 3: Rob

Season 4: Tyrion

Season 5: Cersei

Season 6: Sansa

Season 7: Jon

Season 8: Tyrion

21 of 43

Do Birds of a Feather Flock Together?

Homophilic behaviors in complex networks

22 of 43

Homophily

Property of (social) networks that nodes of the same attitude tends to be connected with

a higher probability than expected

  • It appears as correlation between vertex properties of x(i) and x(j) if (i,j) ∈ E

Disassortative mixing:

Contrary of homophily: dissimilar nodes tend to be connected �(e.g., sexual networks, predator-prey)

Examples of Vertex properties

age, gender, nationality,�political beliefs, excitatory or inhibitory neuron

Homophily can be a link creation mechanism or consequence of influence (and it is difficult to distinguish)

23 of 43

Assortative Mixing�(Newman’s assortativity)

Quantify homophily while discrete (categoric) node properties are involved

where:

  • eij fraction of links connecting nodes of type i and j
  • ai fraction of out-links from nodes of type a
  • bi fraction of in-links for type b nodes

Interpretation

  • r=0: no assortative mixing
  • r=1: perfectly assortative
  • -1<r<0: disassortative mixing

0.621

Newman, Mark EJ. "Mixing patterns in networks."

Physical Review E 67.2 (2003): 026126.

24 of 43

Assortative Mixing�Newman’s assortativity - Example

Compute Newman’s assortativity given the network below:

Newman, Mark EJ. "Mixing patterns in networks."

Physical Review E 67.2 (2003): 026126.

=

=

=

Assortativity Matrix

Assortativity Index

In undirected network ai = bi

25 of 43

Assortative Mixing�(Newman’s assortativity)

Quantify homophily while scalar node properties are involved (e.g., degree assortativity)

where:

  • exy fraction of links connecting nodes with values x and y
  • ax fraction of out-links from nodes having value x
  • by fraction of in-links for nodes having value y
  • σ standard deviations

l

Degree disassortative

Nodes tends to connect in a star-like topology

Newman, Mark EJ. "Mixing patterns in networks."

Physical Review E 67.2 (2003): 026126.

26 of 43

Assortativity by degree w.r.t. different network types

27 of 43

Degree correlation

(insights)

  • Using a single number: Newman’s assortativity

  • Plotting knn(k) in function of k,

where

is a degree correlation function, e.g., the average degree of the neighbors of all degree-k nodes.

28 of 43

Clumpiness

Similar (scalar) characteristics are more prominent along short distances

e.g., high-degree nodes are separated by few links

where:

  • ki the degree of node i
  • α > 0 a real parameter
  • dij the distance between nodes i and j

Clumped assortative

Λ increases with the increase of the degrees of the nodes in the network

Clumped disassortative

Λ decreases with the increase in the separation between the degrees of the nodes in the network

Estrada, N. Hatano, A. Gutierrez, “Clumpiness mixing in complex networks”, Journal of Statistical Mechanics: Theory and Experiment.

29 of 43

Examples and Case Study

30 of 43

Case study: Happiness

James H. Fowler, Nicholas A. Christakis.

Dynamic Spread of Happiness in a Large Social Network:

Longitudinal Analysis Over 20 Years in the Framingham Heart Study

British Medical Journal 337 (4 December 2008)

31 of 43

Is a Global Measure enough?

“Sure I can work with the means, but I'd rather party with the outliers...”

32 of 43

Limits of a global assortativity score

Local assortativity of gender in a sample of Facebook friendships (McAuley and Leskovec 2012).

Different regions of the graph exhibit strikingly different patterns, suggesting that a single variable, e.g. global assortativity (Newman’s), would provide a poor description of the system.

33 of 43

Moving toward a multiscale approach to measure assortativity

Five networks (top) of n=40 nodes and m=160 edges with the same global assortativity r=0

34 of 43

Multiscale Mixing Patterns

Idea:

A local measure that captures the mixing patterns within the local neighbourhood of a given node.

Trivial solution:

Consider only the node’s neighbors

  • issue with sample size �(what about low degree nodes?)

Better approximation:

Considering the stable state of a RW (probability to reach a given node) �to weight the edges

Issue:

Need to fix the value of α

  • α=0 the RW stays put,
  • α=1 the RW never restarts

Solution:

Integrate over all possible α (multiscale approach)

Peel, Leto, Jean-Charles Delvenne, and Renaud Lambiotte. "Multiscale mixing patterns in networks." Proceedings of the National Academy of Sciences 115.16 (2018): 4057-4062.

35 of 43

Peel’s Quintet: Only A has a random-like assortment distribution

36 of 43

Facebook100�Distribution of local assortativity for the “dorm” node feature

Evaluation on real data

37 of 43

Facebook100�Correlation of local assortativities by dorm and matriculation year (x axis) �and proportion of nodes which are more assortative by dorm than by year (y axis).

Evaluation on real data (cont’d)

38 of 43

Distance matter:�Conformity

Idea:

Random walkers flatten reachability distances: can we weight local assortativity considering shortest paths?

Solution:

Clumpiness like modeling:

  • the closer similar nodes are, the higher their local “assortativity”

Strengths:

  • Higher informative power w.r.t. Multiscale-Mixing
  • Solves the “range” problem�(the full interval [-1,1] is covered)
  • Tunable distance dumping factor
  • Allows for multi label profiles

Weaknesses:

Costly computation

  • requires the extraction of all shortest paths

similarity

distance dumping

G. Rossetti, S. Citraro and L. Milli, "Conformity: a Path-Aware Homophily measure for Node-Attributed Networks" in IEEE Intelligent Systems, vol. , no. 01, pp. 1-1, 5555. 2021

39 of 43

Node colors map categorical attribute values, while node sizes encode the respective

Conformity scores (the smaller the size, the lower the score)

Conformity: Intuition

40 of 43

Peel’s Quintet: Only A has a random-like assortment distribution

A

B

C

D

E

41 of 43

Facebook100�Distribution of local assortativity for the “gender” node feature

Evaluation on real data

42 of 43

GoT Season 6�Node attributes: Houses. Global attribute assortativity: 0.2�Multiscale Mixing (left), Conformity (right)�

Multiscale Mixing Vs. Conformity

43 of 43

Chapter 6 Conclusion

Take Away Messages

  1. Nodes positions play an important role in network topology
  2. Different centralities allows to capture, valuable, information
  3. In social contexts individuals tend to cluster following homophilic patterns

What’s Next

Chapter 7:�Tie Strength & Resilience��

Suggested Readings

  • Chapters 3 & 4 of Kleinberg's book

Notebook

Chapter 6: Centrality & Assortativity

https://github.com/sna-unipi/SNA_lectures_notebooks