Topological properties of graphs
Nov 4th 2021
BMI 826-23 Computational Network Biology�Fall 2021
Sushmita Roy
Overview of lecture topics
Biological problem
Computational approaches
Course material is organized by the biological problem and computational approaches to address the problem
Topics in this section
Goals for today
Why should we care about network measures?
“Probably the most important discovery of network theory was the realization that despite the remarkable diversity of networks in nature, their architecture is governed by a few simple principles that are common to most networks of major scientific and technological interest” Barabasi and Oltvai, Nature Genetics Review 2004
Commonly measured network properties
Recall node degree
C
B
A
D
G
H
E
F
Average degree
Degree distribution
Different degree distributions
Poisson distribution
k
P(X=k)
Power law distribution
P(k)
k
Barabasi and Oltvai, Nature Genetics Review 2004
Erdös Renyi random graphs
Scale free networks
Poisson versus Scale free
Barabasi & Oltvai 2004, Nature Genetics Review
Yeast protein interaction network is believed to be scale free
Barabasi & Oltvai 2004, Nature Genetics Review
Degree of a node is correlated to functional importance of a node
Yeast PPI network degree distribution
5,905 nodes
86,286 edges
Origin of scale free networks
Barabasi & Oltvai 2004, Nature Genetics Review
Growth and preferential attachment in scale free networks
A new node (red) is more likely to connect to node 1
than 2
Barabasi & Oltvai 2004, Nature Genetics Review
Commonly measured network properties
Paths on a graph
C
B
A
D
G
H
E
Path: a set of connected edges from one node to another
Paths from B to D
F
There are two paths from B-D: B->A->D and B->G->H->A->D
Path lengths
Scale-free networks tend to be ultra-small
Yeast protein interaction network
Avg. path length: 2.6
Diameter: 6
Nodes: 5,905
Edges: 86,286
Commonly measured network properties
Network motifs
Milo Science 2002
Network motifs of size 3 in a directed network
Additional possibilities if we consider the sign of an edge
Milo Science 2002
Different types of feed-forward motifs
Coherent: sign of the direct path is the same as the indirect path
Incoherent: sign of the direct path and indirect path are not the same
Alon 2007, Nature Review Genetics
These appear much more in transcriptional networks than other FF motifs
Network motifs are associated with dynamics
A feed-forward motif with an AND gate can encode a sign sensitive delay
Alon 2007, Nature Review Genetics
Network motifs found in many complex networks
The occurrence of the feed-forward loop in both networks suggests a fundamental similarity in the design on these networks
Milo et al., Science 2002
Structural common motifs seen in the yeast regulatory network
Lee et.al. 2002, Mangan & Alon, 2003
Auto-regulation
Multi-component
Feed-forward loop
Single Input
Multi Input
Regulatory Chain
Feed-forward loops involved in speeding up in response of target gene
How to find network motifs?
Wernicke 2005: http://theinf1.informatik.uni-jena.de/publications/network-motifs-wabi05.pdf
Algorithm for enumerating subgraphs: Edge sampling
N(V’): set of neighbors of all vertices in V’
Kashtan et al., Bioinformatics 2004
Generating a randomized network
Strategy to generate randomized networks (Switching method)
1
2
3
4
5
1
2
3
4
5
Select two edges connecting four vertices and swap the end points.
Repeat.
Assessing the significance of a network motif
The motif (red dashed edges) occurs much more frequently in the real network than
in any randomized network
Commonly measured network properties
Modularity in networks
A modular network
Module 1
Module 2
Module 3
M. E. J. Newman, Modularity and community structure in networks, PNAS 2006
Different types of network modules
Albert-László Barabási, Natali Gulbahce, and Joseph Loscalzo, Nature Review Genetics 2011, Mitra et al., Nature Review Genetics 2014
Studying modularity in biological systems
Clustering coefficient
A
B
C
?
Clustering coefficient example
A
C
B
G
D
Q measure for modularity
K: number of groups
eij: Fraction of total edges that link nodes in group i to group j
Detecting community structure in networks. M.E. J Newman
Fraction of edges within group i
Fraction of edges without regard to community structure
Summary of topological properties of networks
Goals for today
Given a graph, what are the modules
Problem definition of (topological) module detection
Common graph clustering algorithms
Clustering
Task definition: clustering objects
Flat clustering
Hierarchical clustering
Slides adapted from Prof. Mark Craven; BMI 576
Types of hierarchical clustering strategies
Hierarchical clustering
leaves represent objects to be clustered (e.g. genes or samples)
height of bar indicates
degree of distance
within cluster
distance scale
0
Slides from Prof. Mark Craven
Flat clustering from a hierarchical clustering
cutting here results
in 2 clusters
cutting here results
in 4 clusters
Slides from Prof. Mark Craven
Hierarchical clustering to find modules on graphs
N1(i): the set of immediate neighbors of i
aij: corresponding entry in the adjacency matrix of the graph
Yip and Horvath, BMC Bioinformatics 2007; WGCNA: an R package for weighted correlation network analysis. Peter Langfelder and Steve Horvath
Community structure in a graph
Detecting community structure in networks, M. E. J. Newman
Communities
Intercommunity edges
Girvan-Newman algorithm
M. E. J. Newman and M. Girvan. Finding and evaluating community structure
Betweenness of an edge
Girvan-Newman algorithm
M. E. J. Newman and M. Girvan. Finding and evaluating community structure
Girvan-Newman algorithm as a hierarchical clustering algorithm
Graph vertices
Zachary’s karate club study
Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.
Girvan Newman on Zachary’s karate club dataset
Each node is an individual and edges represent social interactions among individuals. The shape and colors represent different groups.
Node grouping based on betweenness
Modularity
Take away points
References