1 of 67

Topological properties of graphs

Nov 4^th 2021

BMI 826-23 Computational Network Biology�Fall 2021

Sushmita Roy

https://compnetbiocourse.discovery.wisc.edu

2 of 67

Overview of lecture topics

Biological problem

Mapping regulatory network structure
Dynamics and context specificity of networks
Modularity in biological networks
Comparison of biological networks
Identification of important genes
Integrating different types of molecular genomic data
Predicting protein interfaces

Computational approaches

Probabilistic graphical models
Graph structure learning
Multiple network learning
Graph clustering
Graph alignment
Diffusion on graphs
Deep learning

Course material is organized by the biological problem and computational approaches to address the problem

3 of 67

Topics in this section

Topological properties of networks (Today)

Modules and graph clustering (Today)

Algorithms for graph clustering (Nov 9^th, 11^th)

Dynamic module identification (Nov 11^th)

4 of 67

Goals for today

Commonly measured network properties

Degree distribution
Average shortest path length
Network motifs
Modularity of biological networks

Algorithms to find modules on graphs

Clustering
Girvan Newman algorithm

5 of 67

Why should we care about network measures?

Are there specific properties that characterize real world networks?
How can we generate networks with such properties?

“Probably the most important discovery of network theory was the realization that despite the remarkable diversity of networks in nature, their architecture is governed by a few simple principles that are common to most networks of major scientific and technological interest” Barabasi and Oltvai, Nature Genetics Review 2004

6 of 67

Commonly measured network properties

Degree distribution
Average shortest path length
Network motifs
Modularity of biological networks

7 of 67

Recall node degree

Undirected network

Degree, k: Number of neighbors of a node

Directed network

In degree, k_in: Number of incoming edges
Out degree, k_out: Number of outgoing edges

In degree of B is 1
Out degree of A is 2

8 of 67

Average degree

Consider an undirected network with N nodes
Let k_i denote the degree of node i
Average degree is

9 of 67

Degree distribution

P(k) the probability that a node has k edges
Different networks can have different degree distributions
A fundamental property that can be used to characterize a network

10 of 67

Different degree distributions

Poisson distribution

The mean is a good representation of k_i of all nodes
Networks that have a Poisson degree distribution are called Erdös Renyi or random networks

Power law distribution

Also called scale free
There is no “typical” node that captures the degree of nodes.

11 of 67

Poisson distribution

A discrete distribution

The Poisson is parameterized by which can be easily estimated by maximum likelihood

P(X=k)

12 of 67

Power law distribution

Used to capture the degree distribution of most real networks

Typical value of is between 2 and 3.

MLE exists but is more complicated

See Power-Law Distributions in Empirical Data. Clauset, Shalizi and Newman, 2009 for details

P(k)

Barabasi and Oltvai, Nature Genetics Review 2004

13 of 67

Erdös Renyi random graphs

Dates back to 1960 due to two mathematicians Paul Erdös and Alfred Renyi.
Provides a probabilistic model to generate a graph
Starts with N nodes and connects two nodes with probability p
Node degrees follow a Poisson distribution
Tail falls off exponentially, suggesting that nodes with degrees different from the mean are very rare

14 of 67

Scale free networks

Degree distribution is captured by a power law distribution
There is no “typical” node that describes the degree of all other nodes
Such networks are ubiquitous in nature

15 of 67

Poisson versus Scale free

Barabasi & Oltvai 2004, Nature Genetics Review

16 of 67

Yeast protein interaction network is believed to be scale free

“Whereas most proteins participate in only a few interactions, a few participate in dozens”

Such high degree nodes are called hubs

Barabasi & Oltvai 2004, Nature Genetics Review

17 of 67

Degree of a node is correlated to functional importance of a node

Hubs maybe associated with important biological function
Red nodes on deletion cause the organism to die
Red nodes also among the nodes with the highest degree

18 of 67

Yeast PPI network degree distribution

5,905 nodes

86,286 edges

19 of 67

Origin of scale free networks

Scale free networks are ubiquitous is nature
How do such networks form?
Such networks are the result of two processes

a growth process where new nodes join the network over an extended period of time

Think about how the internet has grown

preferential attachment: new nodes tend to connect to nodes with many neighbors

Rich get richer.

Barabasi & Oltvai 2004, Nature Genetics Review

20 of 67

Growth and preferential attachment in scale free networks

A new node (red) is more likely to connect to node 1

than 2

Barabasi & Oltvai 2004, Nature Genetics Review

21 of 67

Commonly measured network properties

Degree distribution
Average shortest path length
Network motifs
Modularity of biological networks

22 of 67

Paths on a graph

Path: a set of connected edges from one node to another

Paths from B to D

There are two paths from B-D: B->A->D and B->G->H->A->D

23 of 67

Path lengths

The shortest path length between two nodes A and B:

The smallest number of edges that need to be traversed to get from A to B

Mean path length is the average of all shortest path lengths
Diameter of a graph is the longest of all shortest paths in the network

24 of 67

Scale-free networks tend to be ultra-small

Two nodes on the network are connected by a small number of edges
Average path length is proportional to log(log(N)), where N is the number of nodes in the network
In a random network (Erdös Renyi network) the average path length is proportional to log(N)

25 of 67

Yeast protein interaction network

Avg. path length: 2.6

Diameter: 6

Nodes: 5,905

Edges: 86,286

26 of 67

Commonly measured network properties

Degree distribution
Average shortest path length
Network motifs
Modularity of biological networks

27 of 67

Network motifs

Network motifs:

small recurring subgraphs that occur much more than a randomized network

Often called “building blocks” of complex networks
Biological networks tend to exhibit certain types of network motifs
Some motifs are associated with specific network dynamics

Milo Science 2002

28 of 67

Network motifs of size 3 in a directed network

Additional possibilities if we consider the sign of an edge

Milo Science 2002

29 of 67

Different types of feed-forward motifs

Coherent: sign of the direct path is the same as the indirect path

Incoherent: sign of the direct path and indirect path are not the same

Alon 2007, Nature Review Genetics

These appear much more in transcriptional networks than other FF motifs

30 of 67

Network motifs are associated with dynamics

A feed-forward motif with an AND gate can encode a sign sensitive delay

Alon 2007, Nature Review Genetics

31 of 67

Network motifs found in many complex networks

The occurrence of the feed-forward loop in both networks suggests a fundamental similarity in the design on these networks

Milo et al., Science 2002

32 of 67

Structural common motifs seen in the yeast regulatory network

Lee et.al. 2002, Mangan & Alon, 2003

Auto-regulation

Multi-component

Feed-forward loop

Single Input

Multi Input

Regulatory Chain

Feed-forward loops involved in speeding up in response of target gene

33 of 67

How to find network motifs?

Given an input network G we need to address two problems

Subgraph enumeration: Find which subgraphs occur in G and how many times
Significance of the number of occurrences: Compare to the number of occurrences of subgraphs in randomized networks

Software to find network motifs

FANMOD: a tool for fast network motif detection http://bioinformatics.oxfordjournals.org/content/22/9/1152.full

Wernicke 2005: http://theinf1.informatik.uni-jena.de/publications/network-motifs-wabi05.pdf

34 of 67

Algorithm for enumerating subgraphs: Edge sampling

N(V’): set of neighbors of all vertices in V’

Kashtan et al., Bioinformatics 2004

35 of 67

Generating a randomized network

While an Erdös Renyi network is random, it does not have the same degree distribution as a given network
How to generate a randomized network with the same degree distribution?

36 of 67

Strategy to generate randomized networks (Switching method)

Select two edges connecting four vertices and swap the end points.

Repeat.

37 of 67

Assessing the significance of a network motif

The motif (red dashed edges) occurs much more frequently in the real network than

in any randomized network

38 of 67

Commonly measured network properties

Degree distribution
Average shortest path length
Network motifs
Modularity of biological networks

39 of 67

Modularity in networks

Modularity “refers to a group of physically or functionally linked nodes that work together to achieve a distinct function” -- Barabasi & Oltvai
Modularity is an important principle of biological systems
Genes tend to interact with a select set of other genes exhibiting a clustering of interactions
Module detection can help to

Understand the organizational properties of the network
Can be used to predict function of genes based on their grouping behavior

40 of 67

A modular network

Module 1

Module 2

Module 3

M. E. J. Newman, Modularity and community structure in networks, PNAS 2006

41 of 67

Different types of network modules

Topological modules

Defined solely based on the graph connectivity of nodes

Functional modules

Based on graph connectivity & other node attributes
Can be further grouped into

Active modules
Integrative modules
Disease modules

Albert-László Barabási, Natali Gulbahce, and Joseph Loscalzo, Nature Review Genetics 2011, Mitra et al., Nature Review Genetics 2014

42 of 67

Studying modularity in biological systems

Given a network is it modular?

Clustering coefficient
Q Modularity: measures the relative density of edges between and within the groups

Given a network what are the modules in the network?

43 of 67

Clustering coefficient

Measure of transitivity in the network that asks

If A is connected to B, and B is connected to C, how often is A connected to C?

Clustering coefficient C_i for each node i is

k_i Degree of node i
n_iis the number of edges among neighbors of i
Average clustering coefficient gives a measure of “modularity” of the network

44 of 67

Clustering coefficient example

45 of 67

Q measure for modularity

Suppose the nodes in the graph belong to K groups (communities)
Modularity (Q) can be assessed as follows:

difference between within group (community) connections and expected connections within a group

This measure assess how good a particular grouping is

K: number of groups

e_ij: Fraction of total edges that link nodes in group i to group j

Detecting community structure in networks. M.E. J Newman

Fraction of edges within group i

Fraction of edges without regard to community structure

46 of 67

Summary of topological properties of networks

Given a network, its topology can be characterized using different measures

Degree distribution
Average path length
Clustering coefficient (also used to measure modularity of a network)

Degree distribution can be

Poisson
Power law or scale free

Network motifs

Building blocks of complex networks
Over-represented subgraphs of specific types

Network modularity

Clustering coefficient
Q modularity

47 of 67

Goals for today

Commonly measured network properties

Degree distribution
Average shortest path length
Network motifs
Modularity of biological networks

Algorithms to find modules on graphs

Clustering
Girvan Newman algorithm

48 of 67

Given a graph, what are the modules

Given a graph find the modules

Modules are represented by densely connected subgraphs

The graph can be partitioned into modules using “Graph clustering” or “Graph partitioning” or “Community Detection”
Clusters, modules are also called “communities”

49 of 67

Problem definition of (topological) module detection

Given:

A graph
A measure of cluster quality

Partition the vertices into groups such that they form densely connected subgraphs (e.g. as measured by the cluster quality)

50 of 67

Common graph clustering algorithms

Hierarchical or flat clustering using a notion of similarity between nodes
Girvan-Newman algorithm
Hierarchical Agglomerative clustering
Spectral clustering
Stochastic block models
Markov clustering algorithm
Affinity propagation

51 of 67

Clustering

Types of clustering

Flat clustering

K-means
Gaussian mixture models

Hierarchical clustering

Clustering algorithms differ in the distance measure used to group objects together

52 of 67

Task definition: clustering objects

Given: attributes for a set of objects we wish to cluster
Do: organize objects into groups such that

Objects in the same cluster are highly similar to each other
Objects from different clusters have low similarity to each other

53 of 67

Flat clustering

Cluster objects into K clusters
K : number of clusters is a user defined argument
Two example algorithms

K-means
Gaussian mixture model-based clustering

54 of 67

Hierarchical clustering

Hierarchical clustering is a widely used clustering technique
Instead of the number of clusters, it requires us to specify how much dissimilarity we will tolerate between groups
The hierarchical clustering is represented by a tree structure called a dendrogram

Slides adapted from Prof. Mark Craven; BMI 576

55 of 67

Types of hierarchical clustering strategies

Agglomerative (bottom-up)

Start from the individual objects
Group objects or clusters of objects

Divisive (or top-down)

Start from all objects in a single cluster
Break down each cluster into smaller clusters

56 of 67

Hierarchical clustering

leaves represent objects to be clustered (e.g. genes or samples)

height of bar indicates

degree of distance

within cluster

distance scale

Slides from Prof. Mark Craven

57 of 67

Flat clustering from a hierarchical clustering

We can always generate a flat clustering from a hierarchical clustering by “cutting” the tree at some distance threshold

cutting here results

in 2 clusters

cutting here results

in 4 clusters

Slides from Prof. Mark Craven

58 of 67

Hierarchical clustering to find modules on graphs

What is a good measure of similarity to cluster nodes on a graph?
One approach is to use local topological overlap t_ij

N₁(i): the set of immediate neighbors of i

a_ij:corresponding entry in the adjacency matrix of the graph

Yip and Horvath, BMC Bioinformatics 2007; WGCNA: an R package for weighted correlation network analysis. Peter Langfelder and Steve Horvath

59 of 67

Community structure in a graph

A graph that has a grouping (community) structure is going to have few intercommunity edges.
Community structure can be revealed by removing such intercommunity edges

Detecting community structure in networks, M. E. J. Newman

Communities

Intercommunity edges

60 of 67

Girvan-Newman algorithm

General idea: “If two communities are joined by only a few inter-community edges, then all paths through the network from vertices in one community to vertices in the other must pass along one of those few edges.”
Community structure can be revealed by removing edges that with high betweenness
Algorithm is based on a divisive clustering idea

M. E. J. Newman and M. Girvan. Finding and evaluating community structure

61 of 67

Betweenness of an edge

Betweenness of an edge e is defined as the number of shortest paths that include e
Edges that lie between communities tend to have high betweenness

62 of 67

Girvan-Newman algorithm

Initialize

Compute betweenness for all edges

Repeat until convergence criteria

Remove the edge with the highest betweenness
Recompute betweenness of remaining edges

Convergence criteria can be

No more edges
Desired modularity

M. E. J. Newman and M. Girvan. Finding and evaluating community structure

63 of 67

Girvan-Newman algorithm as a hierarchical clustering algorithm

One can view this algorithm as a top-down (divisive) hierarchical clustering algorithm
The root of the dendrogram groups all nodes into one community
Each branch of the tree represents the order of splitting the network as edges are removed

Graph vertices

64 of 67

Zachary’s karate club study

A well-known example of a social network with community structure
Dataset collected by Wayne Zachary over 2 years who observed social interactions among members of a karate club
Due to a dispute the club split into two factions
Can a graph clustering/module detection algorithm predict the factions?

Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.

65 of 67

Girvan Newman on Zachary’s karate club dataset

Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.

Node grouping based on betweenness

Modularity

66 of 67

Take away points

Biological networks are modular
Modules can be topological or functional
Modularity can be measured using

Clustering coefficient
Q measure

We have seen one example of topological clustering algorithms

Girvan-Newman algorithm

based on edge-betweenness
Can be viewed as top-down/divisive clustering algorithm

67 of 67

References

A.-L. Barabasi and Z. N. Oltvai, "Network biology: understanding the cell's functional organization," Nature Reviews Genetics, vol. 5, no. 2, pp. 101-113, Feb. 2004. [Online]. Available: http://dx.doi.org/10.1038/nrg1272
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, "Network motifs: Simple building blocks of complex networks," Science, vol. 298, no. 5594, pp. 824-827, Oct. 2002. [Online]. Available: http://dx.doi.org/10.1126/science.298.5594.824
U. Alon, "Network motifs: theory and experimental approaches," Nature Reviews Genetics, vol. 8, no. 6, pp. 450-461, Jun. 2007. [Online]. Available: http://dx.doi.org/10.1038/nrg2102
S. Wernicke and F. Rasche, "FANMOD: a tool for fast network motif detection," Bioinformatics, vol. 22, no. 9, pp. 1152-1153, May 2006. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btl038
M. E. J. Newman and M. Girvan. “Finding and evaluating community structure in networks”, 2003, 10.1103/PhysRevE.69.026113
M. E. J. Newman, "Modularity and community structure in networks," Proceedings of the National Academy of Sciences, vol. 103, no. 23, pp. 8577-8582, Jun. 2006. [Online]. Available: http://dx.doi.org/10.1073/pnas.0601602103
A. Clauset, C. R. Shalizi, and M. E. J. Newman, "Power-law distributions in empirical data," SIAM Review, vol. 51, no. 4, pp. 661-703, Feb. 2009. [Online]. Available: http://dx.doi.org/10.1137/070710111