Chapter 5
Metrics
Metrics
Objective measurements
of a subject matter
Subject matter? Employee performance
Objective? Numbers do not lie.
They do not care whose name is what
Just pure numbers and stats
Metrics
Networks have metrics too
Which nodes are more important?
Which nodes act like a bridge?
Which nodes are close to each other
than everyone else
Three types of metrics
Node based
Node Pair based
Network Based
What is a node?
Bob
Janice
Jack
Jen
Alice
People here are the nodes
First class citizen
of the network
Here, GoT characters
are the nodes
Node metrics: Degree
0
5
3
1
2
4
Degree: count of neighbors
Node 0 has degree 3, because it has 3 neighbors, 1,2,4
Node 5 has degree 2, because it has two neighbors, 3,2
Undirected Graph
For Undirected graphs!!!!!
Node metrics: Degree
Indegree: # of incoming links
Outdegree: # of outgoing links
0
5
3
1
2
4
For Directed graph !!!!!!!!
Node 0 has indegree 2
And outdegree 1
Node 2 has indegree 3
And outdegree 2
Node metrics: Degree
bob
Kate
Nate
Jack
alice
Jane
In a social network:
degrees say how many friends do you have
How many friends do Alice have? 3 because
The node has degree 3
What’s the upper limit of a node’s degree?
Suppose the network has n nodes.
(N-1)
Node metrics: Closeness
A way of detecting which nodes
can spread information
With the fewest effort
B will be selected for best marketing reach performance
Node metrics: Closeness
Let’s try to calculate closeness for node 2 in this graph
0
5
3
1
2
4
Undirected Graph
Equation for closeness of a node i
N = 6 (# nodes in graph)
d(i,j) is the geodesic shortest distance from node i to j
d(2,0) = 1
d(2,1) = 1
d(2,5) = 1
d(2,4) = 2🡪0🡪4 so 2
d(2,3) = 2🡪1🡪3, or 2🡪5🡪3 so 2
Sum of all distances = 1+1+1+2+2 = 7
No point measuring d(2,2)
Closeness of node 2 is 6/7 = 0.857
The higher, the more central the node is
Node metrics: Closeness
Now, let’s try for node 4
0
5
3
1
2
4
Undirected Graph
Equation for closeness of a node i
N = 6 (# nodes in graph)
d(4,0) = 1
d(4,3) = 1
d(4,2) = 4🡪0🡪2 so 2
d(4,1) = 4🡪0🡪1 so 2
Sum of all distances = 1+1+2+2+2 = 8
d(4,5) = 4🡪3🡪5 so 2
Closeness of node 4 is 6/8 = 0.75
Closeness of node 2 was 6/7 = 0.857
You will select node 2 for most efficient marketing influence
Node Metrics: betweenness
0
8
1
2
4
5
3
6
7
How many shortest geodesic paths go through a node
These nodes are like bridges in the network
Without them, the network might get fragmented
If you take out 5, the network gets fragmented!
Node Metrics: betweenness
So, we must find shortest paths for all node pairs
How many node pairs?
Well, suppose our network there are 4 nodes, a,b,c,d
Then the node pairs are
ab ac ad
ba bc bd
ca cb cd
da db dc
Total = n(n-1) = 4*3 = 12
a
b
c
d
The network is bidirectional so ab and ba is the same. Take out the duplicates.
ab ac ad bc bd cd
6 unique node pairs 🡪 n(n-1) / 2 = 12/2 = 6
Node Metrics: betweenness
a
b
c
d
Node pairs are 🡪 ab ac ad bc bd cd
Pair | Shortest paths |
ab | a->b |
ac | a->c |
ad | a->c->d |
bc | b->a->c |
bd | b->a->c->d |
cd | c->d |
Find the geodesic shortest paths for every node pair
Count #of shortest paths through a node
A | B | C | D |
1 | 1 | 0 | 0 |
1 | 0 | 1 | 0 |
1 | 0 | 1 | 1 |
1 | 1 | 1 | 0 |
1 | 1 | 1 | 1 |
0 | 0 | 1 | 1 |
5 | 3 | 5 | 3 |
Sum
Divide by total paths
5/6 | 3/6 | 5/6 | 3/6 |
0.83 | 0.5 | 0.83 | 0.5 |
Node Metrics: betweenness
0
8
1
2
4
5
3
6
7
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 23 | 24 | 25 |
01 | 02 | 03 | 04 | 045 035 | 0456 0356 | 0457 0357 | 08 | 102 | 13 | 104 134 | 135 | 1356 | 1357 | 108 | 203 243 | 24 | 245 |
26 | 27 | 28 | 34 | 35 | 36 | 37 | 38 | 45 | 46 | 47 | 48 | 56 | 57 | 58 | 67 | 68 | 78 |
2456 | 2457 | 208 | 34 | 35 | 356 | 357 | 308 | 45 | 456 | 457 | 408 | 56 | 57 | 5308 5408 | 657 | 65308 65408 | 75308 75408 |
Betweenness for node 8
1/1
0/1
0/1
0/1
0/1
0/2
0/2
0/2
0/1
0/1
0/2
0/1
0/1
0/1
1/1
0/2
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
1/1
1/1
1/1
2/2
2/2
2/2
8 / 36 = 0.22
36 is total paths (n * (n-1) / 2)
Node Metrics: betweenness
0
8
1
2
4
5
3
6
7
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 23 | 24 | 25 |
01 | 02 | 03 | 04 | 045 035 | 0456 0356 | 0457 0357 | 08 | 102 | 13 | 104 134 | 135 | 1356 | 1357 | 108 | 203 243 | 24 | 245 |
26 | 27 | 28 | 34 | 35 | 36 | 37 | 38 | 45 | 46 | 47 | 48 | 56 | 57 | 58 | 67 | 68 | 78 |
2456 | 2457 | 208 | 34 | 35 | 356 | 357 | 308 | 45 | 456 | 457 | 408 | 56 | 57 | 5308 5408 | 657 | 65308 65408 | 75308 75408 |
Betweenness for node 2
0/1
0/1
1/1
0/1
0/1
0/2
0/2
0/2
1/1
0/1
0/2
0/1
0/1
0/1
0/1
2/2
1/1
1/1
1/1
1/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
0/1
1/1
0/1
0/1
0/2
0/2
0/2
8 / 36 = 0.22
36 is total paths (n * (n-1) / 2)
Node Metrics: betweenness
0
8
1
2
4
5
3
6
7
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 23 | 24 | 25 |
01 | 02 | 03 | 04 | 045 035 | 0456 0356 | 0457 0357 | 08 | 102 | 13 | 104 134 | 135 | 1356 | 1357 | 108 | 203 243 | 24 | 245 |
26 | 27 | 28 | 34 | 35 | 36 | 37 | 38 | 45 | 46 | 47 | 48 | 56 | 57 | 58 | 67 | 68 | 78 |
2456 | 2457 | 208 | 34 | 35 | 356 | 357 | 308 | 45 | 456 | 457 | 408 | 56 | 57 | 5308 5408 | 657 | 65308 65408 | 75308 75408 |
Betweenness for node 4
0/1
0/1
0/1
0/1
1/1
1/2
1/2
1/2
0/1
0/1
2/2
0/1
0/1
0/1
0/1
1/2
1/1
1/1
1/1
1/1
1/1
0/1
0/1
0/1
1/1
1/1
1/1
0/1
0/1
0/1
0/1
0/1
1/1
1/2
1/2
1/2
14.5 / 36 = 0.40
36 is total paths (n * (n-1) / 2)
Suppose there are 3 paths and only 1 goes through
The node. Then its 1/2
Unconnected Networks
connected
unconnected
components
0
1
3
2
0
1
3
2
4
5
A
B
Betweenness and centrality is computed inside a component
In component B, node 4 and 5 has high closeness and betweenness score (1)
Does that mean they are very important?
We should always analyze the components individually!
Comparison of metrics
https://dist.neo4j.com/
High degree 🡪 many friends in social network
High closeness 🡪 more centrally located. Everyone gets news
From you with the fewest delay
High betweenness 🡪 important bottlenecks.
Act as bridge between communities.
Eigenvector
Complicated!!!!!!!
https://www.youtube.com/watch?v=PFDu9oVAE-g&t=800s&ab_channel=3Blue1Brown
Eigenvector and power method
1
2
3
5
4
6
0.31 |
0.79 |
0.69 |
1.00 |
0.78 |
0.97 |
Nodes with important neighbors are more important. That’s the basic idea
x
1 | 2 | 3 | 4 | 5 | 6 |
0 | 1 | 0 | 0 | 0 | 0 |
1 | 0 | 1 | 1 | 0 | 0 |
0 | 1 | 0 | 0 | 0 | 1 |
0 | 1 | 0 | 0 | 1 | 1 |
1 | 0 | 0 | 1 | 0 | 1 |
0 | 0 | 1 | 1 | 1 | 0 |
A
Adjacency matrix
αx = Ax
Here for this network α is 2.54
1 |
2 |
3 |
4 |
5 |
6 |
nodes
X denotes the importance of the nodes. We should normalize it to make sure the highest value is 1
Now we will use power method to approximately calculate this x, importance, eigenvector
Eigenvector and power method
1
2
3
5
4
6
Recursively calculate importance of a node from its neighbors’ importance
node | Step 1 | Step 2 | Step 3 | Step 4 | | |
1 | | | | | | |
2 | | | | | | |
3 | | | | | | |
4 | | | | | | |
5 | | | | | | |
6 | | | | | | |
Steps
At first, give every node an importance of 1
1
1
1
1
1
1
0.31 |
0.79 |
0.69 |
1.00 |
0.78 |
0.97 |
x
New power 🡪 sum of my neighbors’ power
1
3
2
3
2
3
Step 2 is basically degree!
Eigenvector and power method
1
2
3
5
4
6
Recursively calculate importance of a node from its neighbors’ importance
node | Step 1 | Step 2 | Step 3 | Step 4 | | |
1 | | | 3 | | | |
2 | | | 6 | | | |
3 | | | 6 | | | |
4 | | | 8 | | | |
5 | | | 6 | | | |
6 | | | 7 | | | |
Steps
1
1
1
1
1
1
0.31 |
0.79 |
0.69 |
1.00 |
0.78 |
0.97 |
x
New power 🡪 sum of my neighbors’ power
1
3
2
3
2
3
Keep updating every step until the values converge ie they stop changing a lot
Also, always make sure to normalize after every step!!!!!!!!
I skipped it for this example
Eigenvector and power method
1
2
3
5
4
6
Recursively calculate importance of a node from its neighbors’ importance
node | Step 1 | Step 2 | Step 3 | Step 4 | Step 5 | |
1 | | | 3 | 6 | | |
2 | | | 6 | 17 | | |
3 | | | 6 | 13 | | |
4 | | | 8 | 19 | | |
5 | | | 6 | 15 | | |
6 | | | 7 | 20 | | |
Steps
1
1
1
1
1
1
0.31 |
0.79 |
0.69 |
1.00 |
0.78 |
0.97 |
x
New power 🡪 sum of my neighbors’ power
1
3
2
3
2
3
Keep updating every step until the values converge ie they stop changing a lot
Also, always make sure to normalize after every step!!!!!!!!
I skipped it for this example
Eigenvector and power method
1
2
3
5
4
6
Recursively calculate importance of a node from its neighbors’ importance
node | Step 1 | Step 2 | Step 3 | Step 4 | Step 5 | normalized |
1 | | | 3 | 6 | 17/52 | 0.32 |
2 | | | 6 | 17 | 38/52 | 0.73 |
3 | | | 6 | 13 | 37/52 | 0.71 |
4 | | | 8 | 19 | 52/52 | 1.00 |
5 | | | 6 | 15 | 39/52 | 0.75 |
6 | | | 7 | 20 | 47/52 | 0.90 |
Steps
1
1
1
1
1
1
0.31 |
0.79 |
0.69 |
1.00 |
0.78 |
0.97 |
x
New power 🡪 sum of my neighbors’ power
1
3
2
3
2
3
Keep updating every step until the values converge ie they stop changing a lot
Almost the same as x!
Normalize: by making the biggest value 1 (52)
Clustering coefficient
How connected are a node’s neighbors
clique
Star
In star, no neighbors are connected
In clique, all the neighbors are connected
Two extremes
A star has neighbors that are not connected to each other at all.
A node that is part of clique, will be in a situation where all neighbors are connected to each other.
Clustering coefficient
0
1
3
2
7
6
5
4
CC = How many edges present among the neighbors / how many possible edges among the neighbors
Lets calculate for node 3
Neighbors are 1,2,4,6 so # neighbors = 4
Possible edges among these 4? 4*(4-1) /2 = 12 /2 = 6 [ Equation: (N*(N-1))/2]
How many edges exist between 1,2,4,6?
Just 1🡪2 so 1
CC = 1/6
Clustering coefficient
0
1
3
2
7
6
5
4
CC = How many edges present among the neighbors / how many possible edges among the neighbors
Now explain this equation?
Extra Slides
RawComm
RawComm