1 of 52

Topological properties of graphs

Oct 16th 2025 (Updated on Oct 21st 2025)

BMI/CS 775 Computational Network Biology�Fall 2025

Sushmita Roy

https://bmi775.sites.wisc.edu

These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Sushmita Roy and Anthony Gitter

2 of 52

Plan for the semester

When

What

Week 2-Week 4

Representing and learning networks from data

Week 5-Week 7

Deep learning in network biology

Week 7-Week 8

Graph analysis, topology, and modules

Week 9-Week 10

Graph alignment and comparison

Week 10-Week 13

Network-based integration and interpretation

Week 14-Week 15

Project presentations

3 of 52

Plan for this section

  • Topological properties of networks (Oct 16th, )

  • Graph clustering (Oct 21st)

  • Representation learning on graphs (Oct 23rd, Oct 28th)

  • Graph alignment and applications (Oct 30th, Nov 4th)

4 of 52

Goals for today

  • Commonly measured network properties
    • Degree distribution
    • Average shortest path length
    • Network motifs
    • Modularity of biological networks

  • Algorithms to find modules on graphs
    • Clustering
    • Girvan Newman algorithm

5 of 52

Why should we care about network properties?

  • Are there specific properties that characterize real world networks?
  • How can we generate networks with such properties?

Probably the most important discovery of network theory was the realization that despite the remarkable diversity of networks in nature, their architecture is governed by a few simple principles that are common to most networks of major scientific and technological interestBarabasi and Oltvai, Nature Genetics Review 2004

6 of 52

Commonly measured network properties

  • Degree distribution
  • Average shortest path length
  • Network motifs
  • Modularity of biological networks

7 of 52

Recall node degree

  • Undirected network
    • Degree, k: Number of neighbors of a node
  • Directed network
    • In degree, kin: Number of incoming edges
    • Out degree, kout: Number of outgoing edges

C

B

A

D

G

H

E

F

  • In degree of B is 1
  • Out degree of A is 2

8 of 52

Average degree

  • Consider an undirected network with N nodes
  • Let ki denote the degree of node i
  • Average degree is

9 of 52

Degree distribution

  • P(k) the probability that a node has k edges
  • Different networks can have different degree distributions
  • A fundamental property that can be used to characterize a network

10 of 52

Different degree distributions

  • Poisson distribution
    • The mean is a good representation of ki of all nodes
    • Networks that have a Poisson degree distribution are called Erdös Renyi or random networks

  • Power law distribution
    • Also called scale free
    • There is no “typical” node that captures the degree of nodes.

11 of 52

Poisson distribution

  • A discrete distribution

  • The Poisson is parameterized by which can be easily estimated by maximum likelihood

k

P(X=k)

12 of 52

Power law distribution

  • Used to capture the degree distribution of most real networks

  • A value of between 2 and 3 is observed in many real-world networks.

  • MLE exists but is more complicated
    • See Power-Law Distributions in Empirical Data. Clauset, Shalizi and Newman, 2009 for details

P(k)

k

Barabasi and Oltvai, Nature Genetics Review 2004

13 of 52

Erdös Renyi random graphs

  • Dates back to 1960 due to two mathematicians Paul Erdös and Alfred Renyi.
  • Provides a probabilistic model to generate a graph
  • Starts with N nodes and connects two nodes with probability p
  • Node degrees follow a Poisson distribution
  • Tail falls off exponentially, suggesting that nodes with degrees different from the mean are very rare

14 of 52

Scale free networks

  • Degree distribution is captured by a power law distribution
  • There is no “typical” node that describes the degree of all other nodes
  • Such networks are thought to be ubiquitous in nature

15 of 52

Poisson versus Scale free

Barabasi & Oltvai 2004, Nature Genetics Review

16 of 52

Yeast protein interaction network is believed to be scale free

  • “Whereas most proteins participate in only a few interactions, a few participate in dozens”

  • Such high degree nodes are called hubs

Barabasi & Oltvai 2004, Nature Genetics Review

17 of 52

Degree of a node is correlated to functional importance of a node

  • Hubs maybe associated with important biological function
  • Red nodes on deletion cause the organism to die
  • Red nodes also among the nodes with the highest degree

18 of 52

Yeast PPI network degree distribution

5,905 nodes

86,286 edges

19 of 52

Origin of scale free networks

  • Scale free networks are thought to be ubiquitous in nature
  • How do such networks arise?
  • Such networks are the result of two processes
    • a growth process where new nodes join the network over an extended period of time
      • Think about how the internet has grown
    • preferential attachment: new nodes tend to connect to nodes with many neighbors
      • Rich get richer.

Barabasi & Oltvai 2004, Nature Genetics Review

20 of 52

Growth and preferential attachment in scale free networks

A new node (red) is more likely to connect to node 1

than 2

Barabasi & Oltvai 2004, Nature Genetics Review

21 of 52

Commonly measured network properties

  • Degree distribution
  • Average shortest path length
  • Network motifs
  • Modularity of biological networks

22 of 52

Paths on a graph

C

B

A

D

G

H

E

Path: a set of connected edges from one node to another

Paths from B to D

F

There are two paths from B-D: B->A->D and B->G->H->A->D

23 of 52

Path lengths

  • The shortest path length between two nodes A and B:
    • The smallest number of edges that need to be traversed to get from A to B
  • Mean path length is the average of all shortest path lengths
  • Diameter of a graph is the longest of all shortest paths in the network

24 of 52

Scale-free networks tend to be ultra-small

  • Two nodes on the network are connected by a small number of edges
  • Average path length is proportional to log(log(N)), where N is the number of nodes in the network
  • In a random network (Erdös Renyi network) the average path length is proportional to log(N)

25 of 52

Yeast protein interaction network

Avg. path length: 2.6

Diameter: 6

Nodes: 5,905

Edges: 86,286

26 of 52

Are scale free networks ubiquitous?

  • Large diverse corpus of 928 networks
  • Classified networks into more categories of scale-freeness
  • Examined other distributions such as Log-normal and Exponential
  • Found many networks exhibit a log-normal distribution

Broido, A. D. & Clauset, A. Scale-free networks are rare. Nat Commun 10, 1017 (2019).

27 of 52

Top Hat Question

28 of 52

Commonly measured network properties

  • Degree distribution
  • Average shortest path length
  • Network motifs
  • Modularity of biological networks

29 of 52

Network motifs

  • Network motifs:
    • small recurring subgraphs that occur much more than a randomized network
  • Often called “building blocks” of complex networks
  • Biological networks tend to exhibit certain types of network motifs
  • Some motifs are associated with specific network dynamics

Milo Science 2002

30 of 52

Network motifs of size 3 in a directed network

Additional possibilities if we consider the sign of an edge

Milo Science 2002

31 of 52

Different types of feed-forward motifs

Coherent: sign of the direct path is the same as the indirect path

Incoherent: sign of the direct path and indirect path are not the same

Alon 2007, Nature Review Genetics

These appear much more in transcriptional networks than other FF motifs

32 of 52

Network motifs are associated with dynamics

A feed-forward motif with an AND gate can encode a sign sensitive delay

Alon 2007, Nature Review Genetics

33 of 52

Network motifs found in many complex networks

The occurrence of the feed-forward loop in both networks suggests a fundamental similarity in the design on these networks

Milo et al., Science 2002

34 of 52

Common structural motifs seen in the yeast regulatory network

Lee et.al. 2002, Mangan & Alon, 2003

Auto-regulation

Multi-component

Feed-forward loop

Single Input

Multi Input

Regulatory Chain

Feed-forward loops involved in speeding up in response of target gene

35 of 52

How to find network motifs?

  • Given an input network G we need to address two problems
    • Subgraph enumeration: Find which subgraphs occur in G and how many times
    • Significance of the number of occurrences: Compare to the number of occurrences of subgraphs in randomized networks
      • Requires checking for subgraph isomorphism
  • Software to find network motifs

36 of 52

Subgraph isomorphism

  •  

https://en.wikipedia.org/wiki/Subgraph_isomorphism_problem

37 of 52

Algorithm for enumerating subgraphs: Edge sampling

N(V’): set of neighbors of all vertices in V’

Kashtan et al., Bioinformatics 2004

38 of 52

Generating a randomized network

  • While an Erdös Renyi network is random, it does not have the same degree distribution as a given network
  • How to generate a randomized network with the same degree distribution?

39 of 52

Strategy to generate randomized networks (Switching method)

1

2

3

4

5

1

2

3

4

5

Select two edges connecting four vertices and swap the end points.

Repeat.

Milo, R., Kashtan, N., Itzkovitz, S., Newman, M. E. J. & Alon, U. On the uniform generation of random graphs with prescribed degree sequences. at http://arxiv.org/abs/cond-mat/0312028 (2004).

40 of 52

Strategy to generate randomized network (Matching method)

A

B

C

D

Adapted from Jure Leskovec slides on “Machine learning with graphs”

A

B

C

D

A

B

C

D

Nodes with spokes

Randomly pair up spokes

Resulting graph

41 of 52

Assessing the significance of a network motif

The motif (red dashed edges) occurs much more frequently in the real network than

in any randomized network

42 of 52

Commonly measured network properties

  • Degree distribution
  • Average shortest path length
  • Network motifs
  • Modularity of biological networks

Resuming on Oct 21st, 2025

43 of 52

Modularity in networks

  • Modularity “refers to a group of physically or functionally linked nodes that work together to achieve a distinct function” -- Barabasi & Oltvai
  • Modularity is an important principle of biological systems
  • Genes tend to interact with a select set of other genes exhibiting a clustering of interactions
  • Module detection can help to
    • Understand the organizational properties of the network
    • Can be used to predict function of genes based on their grouping behavior

44 of 52

A modular network

Module 1

Module 2

Module 3

M. E. J. Newman, Modularity and community structure in networks, PNAS 2006

45 of 52

Different types of network modules

  • Topological modules
    • Defined solely based on the graph connectivity of nodes
  • Functional modules
    • Based on graph connectivity & other node attributes
    • Can be further grouped into
      • Active modules
      • Integrative modules
      • Disease modules

Albert-László Barabási, Natali Gulbahce, and Joseph Loscalzo, Nature Review Genetics 2011, Mitra et al., Nature Review Genetics 2014

46 of 52

Studying modularity in biological systems

  • Given a network is it modular?
    • Clustering coefficient
    • Q Modularity: measures the relative density of edges between and within the groups
  • Given a network what are the modules in the network?

47 of 52

Clustering coefficient

  • Measure of transitivity in the network that asks
    • If A is connected to B, and B is connected to C, how often is A connected to C?
  • Clustering coefficient Ci for each node i is

  • ki Degree of node i
  • ni is the number of edges among neighbors of i
  • Average clustering coefficient gives a measure of “modularity” of the network

A

B

C

?

48 of 52

Clustering coefficient example

A

C

B

G

D

49 of 52

Q measure for modularity

  • Suppose the nodes in the graph belong to K groups (communities)
  • Modularity (Q) can be assessed as follows:
    • difference between within group (community) connections and expected connections within a group
  • This measure assess how good a particular grouping is

K: number of groups

eij: Fraction of total edges that link nodes in group i to group j

Detecting community structure in networks. M.E. J Newman

Fraction of edges within group i

Fraction of edges without regard to community structure

50 of 52

Summary of topological properties of networks

  • Given a network, its topology can be characterized using different measures
    • Degree distribution
    • Average path length
    • Clustering coefficient (also used to measure modularity of a network)
  • Degree distribution can be
    • Poisson
    • Power law
  • Network motifs
    • Building blocks of complex networks
    • Over-represented subgraphs of specific types
  • Network modularity
    • Clustering coefficient
    • Q modularity

51 of 52

References

  • A.-L. Barabasi and Z. N. Oltvai, "Network biology: understanding the cell's functional organization," Nature Reviews Genetics, vol. 5, no. 2, pp. 101-113, Feb. 2004. [Online]. Available: http://dx.doi.org/10.1038/nrg1272
  • R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon, "Network motifs: Simple building blocks of complex networks," Science, vol. 298, no. 5594, pp. 824-827, Oct. 2002. [Online]. Available: http://dx.doi.org/10.1126/science.298.5594.824
  • U. Alon, "Network motifs: theory and experimental approaches," Nature Reviews Genetics, vol. 8, no. 6, pp. 450-461, Jun. 2007. [Online]. Available: http://dx.doi.org/10.1038/nrg2102
  • S. Wernicke and F. Rasche, "FANMOD: a tool for fast network motif detection," Bioinformatics, vol. 22, no. 9, pp. 1152-1153, May 2006. [Online]. Available: http://dx.doi.org/10.1093/bioinformatics/btl038
  • M. E. J. Newman and M. Girvan. “Finding and evaluating community structure in networks”, 2003, 10.1103/PhysRevE.69.026113
  • M. E. J. Newman, "Modularity and community structure in networks," Proceedings of the National Academy of Sciences, vol. 103, no. 23, pp. 8577-8582, Jun. 2006. [Online]. Available: http://dx.doi.org/10.1073/pnas.0601602103
  • A. Clauset, C. R. Shalizi, and M. E. J. Newman, "Power-law distributions in empirical data," SIAM Review, vol. 51, no. 4, pp. 661-703, Feb. 2009. [Online]. Available: http://dx.doi.org/10.1137/070710111
  • Broido, A.D., Clauset, A., 2019. Scale-free networks are rare. Nat Commun 10, 1017. https://doi.org/10.1038/s41467-019-08746-5

52 of 52

Neural Subgraph Matching

Rex et al. Neural Subgraph Matching. at http://arxiv.org/abs/2007.03092 (2020).

  • GNN based framework for the subgraph isomorphism problem
  • Based on embedding node neighborhoods as subgraphs
  • Could be used for motif counting as an additional application