1 of 30

Chapter 5

Metrics

2 of 30

Metrics

Objective measurements

of a subject matter

Subject matter? Employee performance

Objective? Numbers do not lie.

They do not care whose name is what

Just pure numbers and stats

3 of 30

Metrics

Networks have metrics too

Which nodes are more important?

Which nodes act like a bridge?

Which nodes are close to each other

than everyone else

Three types of metrics

Node based

Node Pair based

Network Based

4 of 30

What is a node?

Bob

Janice

Jack

Jen

Alice

People here are the nodes

First class citizen

of the network

Here, GoT characters

are the nodes

5 of 30

Node metrics: Degree

0

5

3

1

2

4

Degree: count of neighbors

Node 0 has degree 3, because it has 3 neighbors, 1,2,4

Node 5 has degree 2, because it has two neighbors, 3,2

Undirected Graph

For Undirected graphs!!!!!

6 of 30

Node metrics: Degree

Indegree: # of incoming links

Outdegree: # of outgoing links

0

5

3

1

2

4

For Directed graph !!!!!!!!

Node 0 has indegree 2

And outdegree 1

Node 2 has indegree 3

And outdegree 2

7 of 30

Node metrics: Degree

bob

Kate

Nate

Jack

alice

Jane

In a social network:

degrees say how many friends do you have

How many friends do Alice have? 3 because

The node has degree 3

What’s the upper limit of a node’s degree?

Suppose the network has n nodes.

(N-1)

8 of 30

Node metrics: Closeness

A way of detecting which nodes

can spread information

With the fewest effort

B will be selected for best marketing reach performance

9 of 30

Node metrics: Closeness

Let’s try to calculate closeness for node 2 in this graph

0

5

3

1

2

4

Undirected Graph

Equation for closeness of a node i

N = 6 (# nodes in graph)

d(i,j) is the geodesic shortest distance from node i to j

d(2,0) = 1

d(2,1) = 1

d(2,5) = 1

d(2,4) = 2🡪0🡪4 so 2

d(2,3) = 2🡪1🡪3, or 2🡪5🡪3 so 2

Sum of all distances = 1+1+1+2+2 = 7

No point measuring d(2,2)

Closeness of node 2 is 6/7 = 0.857

The higher, the more central the node is

10 of 30

Node metrics: Closeness

Now, let’s try for node 4

0

5

3

1

2

4

Undirected Graph

Equation for closeness of a node i

N = 6 (# nodes in graph)

d(4,0) = 1

d(4,3) = 1

d(4,2) = 4🡪0🡪2 so 2

d(4,1) = 4🡪0🡪1 so 2

Sum of all distances = 1+1+2+2+2 = 8

d(4,5) = 4🡪3🡪5 so 2

Closeness of node 4 is 6/8 = 0.75

Closeness of node 2 was 6/7 = 0.857

You will select node 2 for most efficient marketing influence

11 of 30

Node Metrics: betweenness

0

8

1

2

4

5

3

6

7

How many shortest geodesic paths go through a node

These nodes are like bridges in the network

Without them, the network might get fragmented

If you take out 5, the network gets fragmented!

12 of 30

Node Metrics: betweenness

So, we must find shortest paths for all node pairs

How many node pairs?

Well, suppose our network there are 4 nodes, a,b,c,d

Then the node pairs are

ab ac ad

ba bc bd

ca cb cd

da db dc

Total = n(n-1) = 4*3 = 12

a

b

c

d

The network is bidirectional so ab and ba is the same. Take out the duplicates.

ab ac ad bc bd cd

6 unique node pairs 🡪 n(n-1) / 2 = 12/2 = 6

13 of 30

Node Metrics: betweenness

a

b

c

d

Node pairs are 🡪 ab ac ad bc bd cd

Pair

Shortest paths

ab

a->b

ac

a->c

ad

a->c->d

bc

b->a->c

bd

b->a->c->d

cd

c->d

Find the geodesic shortest paths for every node pair

Count #of shortest paths through a node

A

B

C

D

1

1

0

0

1

0

1

0

1

0

1

1

1

1

1

0

1

1

1

1

0

0

1

1

5

3

5

3

Sum

Divide by total paths

5/6

3/6

5/6

3/6

0.83

0.5

0.83

0.5

14 of 30

Node Metrics: betweenness

0

8

1

2

4

5

3

6

7

01

02

03

04

05

06

07

08

12

13

14

15

16

17

18

23

24

25

01

02

03

04

045

035

0456

0356

0457

0357

08

102

13

104

134

135

1356

1357

108

203

243

24

245

26

27

28

34

35

36

37

38

45

46

47

48

56

57

58

67

68

78

2456

2457

208

34

35

356

357

308

45

456

457

408

56

57

5308

5408

657

65308

65408

75308

75408

Betweenness for node 8

1/1

0/1

0/1

0/1

0/1

0/2

0/2

0/2

0/1

0/1

0/2

0/1

0/1

0/1

1/1

0/2

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

1/1

1/1

1/1

2/2

2/2

2/2

8 / 36 = 0.22

36 is total paths (n * (n-1) / 2)

15 of 30

Node Metrics: betweenness

0

8

1

2

4

5

3

6

7

01

02

03

04

05

06

07

08

12

13

14

15

16

17

18

23

24

25

01

02

03

04

045

035

0456

0356

0457

0357

08

102

13

104

134

135

1356

1357

108

203

243

24

245

26

27

28

34

35

36

37

38

45

46

47

48

56

57

58

67

68

78

2456

2457

208

34

35

356

357

308

45

456

457

408

56

57

5308

5408

657

65308

65408

75308

75408

Betweenness for node 2

0/1

0/1

1/1

0/1

0/1

0/2

0/2

0/2

1/1

0/1

0/2

0/1

0/1

0/1

0/1

2/2

1/1

1/1

1/1

1/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

0/1

1/1

0/1

0/1

0/2

0/2

0/2

8 / 36 = 0.22

36 is total paths (n * (n-1) / 2)

16 of 30

Node Metrics: betweenness

0

8

1

2

4

5

3

6

7

01

02

03

04

05

06

07

08

12

13

14

15

16

17

18

23

24

25

01

02

03

04

045

035

0456

0356

0457

0357

08

102

13

104

134

135

1356

1357

108

203

243

24

245

26

27

28

34

35

36

37

38

45

46

47

48

56

57

58

67

68

78

2456

2457

208

34

35

356

357

308

45

456

457

408

56

57

5308

5408

657

65308

65408

75308

75408

Betweenness for node 4

0/1

0/1

0/1

0/1

1/1

1/2

1/2

1/2

0/1

0/1

2/2

0/1

0/1

0/1

0/1

1/2

1/1

1/1

1/1

1/1

1/1

0/1

0/1

0/1

1/1

1/1

1/1

0/1

0/1

0/1

0/1

0/1

1/1

1/2

1/2

1/2

14.5 / 36 = 0.40

36 is total paths (n * (n-1) / 2)

Suppose there are 3 paths and only 1 goes through

The node. Then its 1/2

17 of 30

Unconnected Networks

connected

unconnected

components

0

1

3

2

0

1

3

2

4

5

A

B

Betweenness and centrality is computed inside a component

In component B, node 4 and 5 has high closeness and betweenness score (1)

Does that mean they are very important?

We should always analyze the components individually!

18 of 30

Comparison of metrics

https://dist.neo4j.com/

High degree 🡪 many friends in social network

High closeness 🡪 more centrally located. Everyone gets news

From you with the fewest delay

High betweenness 🡪 important bottlenecks.

Act as bridge between communities.

19 of 30

Eigenvector

  • Eigenvectors are defined and used in Linear Algebra.
  • Any matrix A has a number of Eigenvectors.
  • An Eigenvector x satisfies an equation of the form:
    • αx = Ax where α is a scalar called an eigenvalue
  • As a metric for networks, eigenvectors are used to rank nodes according to the importance and quality (weight) of their neighbors.

Complicated!!!!!!!

https://www.youtube.com/watch?v=PFDu9oVAE-g&t=800s&ab_channel=3Blue1Brown

20 of 30

Eigenvector and power method

1

2

3

5

4

6

0.31

0.79

0.69

1.00

0.78

0.97

Nodes with important neighbors are more important. That’s the basic idea

x

1

2

3

4

5

6

0

1

0

0

0

0

1

0

1

1

0

0

0

1

0

0

0

1

0

1

0

0

1

1

1

0

0

1

0

1

0

0

1

1

1

0

A

Adjacency matrix

αx = Ax

Here for this network α is 2.54

1

2

3

4

5

6

nodes

X denotes the importance of the nodes. We should normalize it to make sure the highest value is 1

Now we will use power method to approximately calculate this x, importance, eigenvector

21 of 30

Eigenvector and power method

1

2

3

5

4

6

Recursively calculate importance of a node from its neighbors’ importance

node

Step 1

Step 2

Step 3

Step 4

1

2

3

4

5

6

Steps

At first, give every node an importance of 1

1

1

1

1

1

1

0.31

0.79

0.69

1.00

0.78

0.97

x

New power 🡪 sum of my neighbors’ power

1

3

2

3

2

3

Step 2 is basically degree!

22 of 30

Eigenvector and power method

1

2

3

5

4

6

Recursively calculate importance of a node from its neighbors’ importance

node

Step 1

Step 2

Step 3

Step 4

1

3

2

6

3

6

4

8

5

6

6

7

Steps

1

1

1

1

1

1

0.31

0.79

0.69

1.00

0.78

0.97

x

New power 🡪 sum of my neighbors’ power

1

3

2

3

2

3

Keep updating every step until the values converge ie they stop changing a lot

Also, always make sure to normalize after every step!!!!!!!!

I skipped it for this example

23 of 30

Eigenvector and power method

1

2

3

5

4

6

Recursively calculate importance of a node from its neighbors’ importance

node

Step 1

Step 2

Step 3

Step 4

Step 5

1

3

6

2

6

17

3

6

13

4

8

19

5

6

15

6

7

20

Steps

1

1

1

1

1

1

0.31

0.79

0.69

1.00

0.78

0.97

x

New power 🡪 sum of my neighbors’ power

1

3

2

3

2

3

Keep updating every step until the values converge ie they stop changing a lot

Also, always make sure to normalize after every step!!!!!!!!

I skipped it for this example

24 of 30

Eigenvector and power method

1

2

3

5

4

6

Recursively calculate importance of a node from its neighbors’ importance

node

Step 1

Step 2

Step 3

Step 4

Step 5

normalized

1

3

6

17/52

0.32

2

6

17

38/52

0.73

3

6

13

37/52

0.71

4

8

19

52/52

1.00

5

6

15

39/52

0.75

6

7

20

47/52

0.90

Steps

1

1

1

1

1

1

0.31

0.79

0.69

1.00

0.78

0.97

x

New power 🡪 sum of my neighbors’ power

1

3

2

3

2

3

Keep updating every step until the values converge ie they stop changing a lot

Almost the same as x!

Normalize: by making the biggest value 1 (52)

25 of 30

Clustering coefficient

How connected are a node’s neighbors

clique

Star

In star, no neighbors are connected

In clique, all the neighbors are connected

Two extremes

A star has neighbors that are not connected to each other at all.

A node that is part of clique, will be in a situation where all neighbors are connected to each other.

26 of 30

Clustering coefficient

0

1

3

2

7

6

5

4

CC = How many edges present among the neighbors / how many possible edges among the neighbors

Lets calculate for node 3

Neighbors are 1,2,4,6 so # neighbors = 4

Possible edges among these 4? 4*(4-1) /2 = 12 /2 = 6 [ Equation: (N*(N-1))/2]

How many edges exist between 1,2,4,6?

Just 1🡪2 so 1

CC = 1/6

27 of 30

Clustering coefficient

0

1

3

2

7

6

5

4

CC = How many edges present among the neighbors / how many possible edges among the neighbors

Now explain this equation?

28 of 30

Extra Slides

29 of 30

RawComm

  • This metric was defined by Professor Scripps
  • Later in the course, we will describe how to identify communities of nodes in networks.
  • A community is a group of nodes where the nodes inside the groups are more tightly connected.
  • There are fewer links between the groups.
  • There is no agreement on what constitutes a good community.
  • RawComm calculates an approximation to the number of communities that a node belong to.

30 of 30

RawComm