1 of 58

Chapter 2 Networks & Graphs: Basic Measures

Summary

  • Graph representations
  • Type of Networks
  • Degree distribution
  • Paths & Connectedness
  • Clustering

Reading

  • Chapter 2 of Barabasi's book

2 of 58

The Bridges of Konigsberg

Can one walk across the seven bridges and never cross the same bridge twice?

Euler’s theorem (1735)

  • If a graph has more than two nodes of odd degree, there is no path/cycle that crosses each bridge exactly once.

  • If a graph is connected and has no odd degree nodes, it has at least one path.

Famous Konigsberg CitizensImmanuel Kant (philosopher, 1724-1804)

3 of 58

Components of�a Complex System

Networks or Graphs?

Network <nodes, links>�refers to real systems�(www, social network, metabolic network)

Graph <vertices, edges>�mathematical representation of a network�(web graph, social graph)

Symbol

Components

nodes, vertices

N

Interactions

edges, links

L

System

network, graph

(N,L)

4 of 58

A Common �Language

The choice of the proper network representation determines our ability to use network theory successfully.

In some cases there is a unique, unambiguous representation. In other cases, the representation is by no means unique.

 

The way we assign the links between a group of individuals will determine the nature of the question we can study.

N=4 L=4

5 of 58

Proper representations (examples)

If you connect individuals that work with each other, you will explore the professional network.

If you connect those that have a romantic and sexual relationship, you will be exploring the sexual networks.

6 of 58

If you connect individuals based on their first name (e.g., all Peters connected to each other),you will be exploring what?

It is a network, nevertheless.

7 of 58

Directedness

Undirected graphsLinks: undirected (symmetrical)

Directed graphs (DiGraphs)Links: directed (arcs).

A

G

F

B

C

D

E

Examples of Undirected links

Co-authorship links

Actor network

Protein interactions

Example of Directed links

URLs on the www

Phone calls

Metabolic reactions

8 of 58

Reference Networks

9 of 58

Degree,

Average Degree,

Degree Distribution

10 of 58

Node Degree

Undirected graphsthe number of links connected to the node

Directed graphs (DiGraphs)we can define an in-degree and out-degree. �The (total) degree is the sum of in- and out-degree.

Source: a node with kin= 0; Sink: a node with kout= 0.

A

B

A

G

F

B

C

D

E

11 of 58

A Bit of Statistics

Four key quantities characterize a sample of N values x1,...,xn

Average (mean):

The nth moment:

The Standard deviation:

The Distribution of x:

where px follows

12 of 58

Average Degree

Undirected graphsN - the number of nodes in the graph

Directed graphs (DiGraphs)

j

i

A

F

B

C

D

E

13 of 58

Reference Networks: Average Degree

14 of 58

Degree Distribution

P(k): probability that a randomly chosen node has degree k

Nk = # nodes with degree k

P(k) = Nk / N ➔ plot

15 of 58

Degree Distribution (cont’d)

Normalization condition:

where Kmin is the minimal degree in the network.

Discrete Representation: pk is the probability that a node has degree k.

Continuum Description: p(k) is the pdf of the degrees, where

represents the probability that a node’s degree is between k1 and k2.

16 of 58

Adjacency matrix

Undirected graphs

Directed graphs (DiGraphs)

2

3

1

4

4

2

3

1

17 of 58

Paths and

Connectedness

18 of 58

Paths

A path is a sequence of nodes in which each node is adjacent to the next one

Pi0,in of length n between nodes i0 and in is an ordered collection of n+1 nodes and n links

In a directed graph, the path can follow only �the direction of an arrow.

Examples of phats in an undirected graph.

Pn = {i0, i1, i2, …, in}

Pn = {(i0, i1), (i1 ,i2), (i2 ,i3), …, (in-1 ,in)}

19 of 58

Distance in a Graph

Directed graphs (DiGraphs)Each path needs to follow the direction of the arrows.

�Thus in a digraph the distance from node A to B �(on an AB path) is generally different from the distance from node B to A (on a BCA path).

Undirected graphsThe distance (shortest path, geodesic path) between two nodes is defined as the number of edges along the shortest path connecting them.

*If the two nodes are disconnected, the distance is infinity.

D

C

A

B

D

C

A

B

20 of 58

Number of paths between two nodes

Nij,number of paths between any two nodes i and j:

 

Length n=1: If there is a link between i and j, then Aij=1 and Aij=0 otherwise.

Length n=2: If there is a path of length two between i and j, then AikAkj=1, and AikAkj=0 otherwise.

The number of paths of length 2:

Length n:

In general, if there is a path of length n between i and j, then Aik…Alj=1 and Aik…Alj=0 otherwise.

The number of paths of length n between i and j is(*)

(*) Holds for both directed and undirected networks.

21 of 58

Finding Distances: BFS

Distance between node 0 �and node 4:

  • Start at 0.

1

1

1

1

2

2

2

2

2

3

3

3

3

3

3

3

3

4

4

4

4

4

4

4

4

0

22 of 58

Finding Distances: BFS

Distance between node 0 �and node 4:

  • Start at 0.
  • Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue.

2

2

2

2

2

3

3

3

3

3

3

3

3

4

4

4

4

4

4

4

4

1

0

1

1

1

23 of 58

Finding Distances: BFS

Distance between node 0 �and node 4:

  • Start at 0.
  • Find the nodes adjacent to 1. Mark them as at distance 1. Put them in a queue.
  • Take the first node out of the queue. Find the unmarked nodes adjacent to it in the graph. Mark them with the label of 2. Put them in the queue.

3

3

3

3

3

4

4

4

4

3

3

3

4

4

4

4

1

1

1

1

2

2

2

2

2

0

24 of 58

Finding Distances: BFS

Distance between node 0 �and node 4:

Repeat until you find node 4 or there are no more nodes in the queue.

The distance between 0 and 4 is the label of 4 or, if 4 does not have a label, infinity.

1

1

1

1

2

2

2

2

2

3

3

3

3

3

3

3

3

4

4

4

4

4

4

4

4

0

25 of 58

Diameter and

Average distance

Diameter (dmax):

the maximum distance between any pair of nodes in the graph.

Average path length/distance, <d>, for a connected graph:

where dij is the distance from node i to node j

In an undirected graph dij =dji , so we only need to count them once:

26 of 58

Paths: a summary

Shortest Path

The path with the shortest length between two nodes (distance).

2

5

4

3

1

27 of 58

Paths: a summary

Diameter

The longest shortest path in a graph.

2

5

4

3

1

28 of 58

Paths: a summary

Average Path Length

The average of the shortest paths for all pairs of nodes.

2

5

4

3

1

29 of 58

Paths: a summary

Cycle

A path with the same start and end node.

Self-Avoiding Path

A path that does not intersect itself.

2

5

4

3

1

2

5

4

3

1

30 of 58

Paths: a summary

Eulerian Path/Cycle

A path that traverses each link exactly once.

Hamiltonian Path/Cycle

A path that visits each node exactly once.

2

5

4

3

1

2

5

4

3

1

31 of 58

Connectivity of undirected graphs

Connected (undirected) graph: any two vertices can be joined by a path.

A disconnected graph is made up by two or more connected components.

Bridge: �if we erase it, the graph becomes disconnected.�Example (A,F)

D

C

A

B

F

F

G

D

C

A

B

F

F

G

Largest Component: Giant Component�The rest: Isolates

32 of 58

Connectivity of undirected graphs

The adjacency matrix of a network with several components can be written in a block-diagonal form, so that nonzero elements are confined to squares, with all other elements being zero.

33 of 58

Connectivity of directed graphs

Strongly connected directed graph (SCC):has a path from each node to every other node and vice versa (e.g. AB path and BA path).

Weakly connected directed graph (WCC): it is connected if we disregard the edge directions.

In-component: nodes that can reach the scc,

Out-component: nodes that can be reached from the scc.

Strongly connected components can be identified, but not every node is part of a nontrivial strongly connected component.

D

C

A

B

F

G

E

E

C

A

B

G

F

D

34 of 58

Network Density

35 of 58

Complete Graph

A graph with degree

L=Lmax

is called a complete graph, and its average degree is

<k>=N-1

The maximum number of links an undirected network of N nodes can have is:

3

1

4

2

What about directed networks?

36 of 58

Network Density

Ratio of existing edges over possible ones.

Complete graph �d(G) = 1

Sparse graph �d(G) = 2/3

3

1

4

2

3

1

4

2

Examples

37 of 58

Most networks observed in real systems are sparse

L << Lmax

<k> << N-1

d(G) << 1

WWW (ND Sample): N=325,729; L=1.4 106 Lmax=1012 <k>=4.51

Protein (S. Cerevisiae): N= 1,870; L=4,470 Lmax=107 <k>=2.39

Coauthorship (Math): N= 70,975; L=2 105 Lmax=3 1010 <k>=3.9

Movie Actors: N=212,250; L=6 106 Lmax=1.8 1013 <k>=28.78

(Source: Albert, Barabasi, RMP2002)

Sparse �Adjacency matrix

38 of 58

Clustering Coefficient

39 of 58

Clustering Coefficient

How “clustered” is my network?

Global Clustering coefficient

    • Triangles and triplets
    • C in [0,1]

Watts & Strogatz, �Nature (1998)

40 of 58

Clustering Coefficient

What fraction of your neighbors are connected?

Local Clustering coefficient

    • Node i with degree ki
    • Ci in [0,1]

Watts & Strogatz, �Nature (1998)

41 of 58

Clustering Coefficient

What fraction of your neighbors are connected?

Local Clustering coefficient

    • Node i with degree ki
    • Ci in [0,1]

Watts & Strogatz, �Nature (1998)

42 of 58

Clustering Coefficient

What fraction of your neighbors are connected on average?

Average Clustering coefficient

    • Average of local clustering coefficients
    • C in [0,1]

Watts & Strogatz, �Nature (1998)

43 of 58

Bipartite Networks

44 of 58

Bipartite Graphs

Bipartite graph (or bigraph)

a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V.

Examples Hollywood actor network

Collaboration networks

Disease network (diseasome)

Projection

Two nodes of the same class are connected by a (weighted) edge if they share at least a common neighbor

45 of 58

Gene - DiseNetwork ase Network

Gene network�(projection)

GENOME

PHENOME

DISEASOME

Disease network�(projection)

Goh, Cusick, Valle, Childs, Vidal & Barabási, �PNAS (2007)

46 of 58

Ingredient-Flavor Bipartite Network

Y.-Y. Ahn, S. E. Ahnert, J. P. Bagrow, A.-L. Barabási �Flavor network and the principles of food pairing, �Scientific Reports 196, (2011).

47 of 58

Summarizing...

48 of 58

Central quantities in Network Science

Degree Distribution

P(k)

Path length

<d>

Clustering Coefficient

49 of 58

Type of graphs�Directedness

3

1

4

2

Actor network, protein-protein interactions

3

2

1

4

WWW, citation networks

Undirected graph

Directed graph

50 of 58

Type of graphsWeightedness

3

1

4

2

protein-protein interactions, WWW

Call Graph, metabolic networks

Unweighted graph

Weighted graph

3

2

1

4

51 of 58

Type of graphsLoops & Multigraphs

protein-protein interactions, WWW

Social Network, Collaboration Network

Self Interactions

Multigraph (undirected)

3

1

4

2

3

2

1

4

52 of 58

Real Networks can have multiple characteristics

Network

Directed

Weighted

Multigraph

Self-loops

WWW

yes

no

yes

yes

Protein interactions

no

no

no

yes

Collaboration network

no

yes

yes

no

Mobile phone calls

yes

yes

no

no

Facebook Friendship

no

no

no

no

53 of 58

Case Study:

Protein-Protein Interaction Network

54 of 58

Case study

Protein-protein interaction

Undirected network

N=2,018 proteins as nodes

L=2,930 binding interactions as links.

Average degree <k>=2.90.

Not connected: 185 components

Largest (giant component) 1,647 nodes

55 of 58

Case study

Protein-protein interaction

pk is the probability that a node has degree k

Nk = # nodes with degree k

pk = Nk / N

56 of 58

Case study

Protein-protein interaction

Path length distribution

dmax=14

<d>=5.61

57 of 58

Case study

Protein-protein interaction

Clustering coefficient vs. node degree

Average Clustering Coeff.

<C>=0.12

58 of 58

Chapter 2 Conclusion

Take Away Messages

  • Semantic shapes graph topology
  • Network properties can be measured
  • Degree distribution
  • Paths & Connectivity
  • Clustering Coefficient

What’s Next

Chapter 3:�Random Networks��

Suggested Readings

  • Chapter 2 of Barabasi's book
  • Chapter 2 of Kleinberg’s book

Notebook

Chapter 2: Basic Measures

https://github.com/sna-unipi/SNA_lectures_notebooks