1 of 36

Diffusion influence interpretation of sequence variants

Nov 11th, 2025

BMI/CS 775 Computational Network Biology�Fall 2025

Anthony Gitter

https://bmi775.sites.wisc.edu

Original slides created by Prof. Sushmita Roy

These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Sushmita Roy and Anthony Gitter

2 of 36

Topics in this section

  • Graph-based approaches for gene prioritization
  • Graph diffusion to interpret sequence variants
  • Combinatorial graph algorithms for subnetwork selection
  • Multi-omic data integration
  • Network integration with graph neural networks

3 of 36

Perturbations in networks

  • Understanding genetic perturbations is important in biology
  • Genetic perturbations are useful to identify the function of genes
    • What happens if we knock down gene A?
      • Measure a morphological phenotype like growth rate or cell size
      • Measure global expression signatures
  • Perturbations can be artificial or natural
    • Artificial perturbations
      • Deletion strains
    • Natural perturbations
      • Single nucleotide polymorphisms
      • Natural genetic variation
  • Perturbations in a network can affect
    • Nodes
    • Edges (e.g., mutations on binding sites)

4 of 36

Identification of subnetworks perturbed in diseases

Cho D-Y, Kim Y-A, Przytycka TM (2012) Chapter 5: Network Biology Approach to Complex Diseases. PLoS Comput Biol 8(12): e1002820. doi:10.1371/journal.pcbi.1002820

http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002820

Perturbed

genes

Genes

Chromosomes

Looks like the GeneWander setting so far

5 of 36

Motivation of HOTNET

  • Somatic mutations play a major role in cancer
  • Mutation profiles of cancers are very heterogeneous
    • ~80 somatic mutations per tumor
    • Two tumors rarely have the same mutations
  • Thousands of genes can be mutated in cancer
    • Difficult to identify “driver” vs “passenger” mutations
    • Mutations at the pathway level (group of genes) are likely responsible for a particular type of cancer
  • Can we identify subnetworks (representing new pathways) that are significantly mutated?

F. Vandin, E. Upfal, and B. J. Raphael, "Algorithms for detecting significantly mutated pathways in cancer." Journal of Computational Biology 2011

6 of 36

Goal of HOTNET

Image from Cowen et al., Nature Review Genetics 2017

All mutations have very low frequencies

Recognize they are all part of the same well-connected subnetwork

Cancer patient samples

7 of 36

HOTNET problem setup

  • Given
    • Protein-protein interaction network
    • A set of patient tumor mutation profiles
  • Do
    • Find “significantly” mutated subgraphs
      • A subgraph that best connects these genetic alterations
      • Best means a subgraph that includes many tumor samples with few genes
  • How?
    • Find the global influence of mutations on a particular gene
    • Search for a subnetwork in this global influence graph that is significantly mutated

8 of 36

HOTNET’s approach vs ActiveModules

  • HOTNET finds a subnetwork that has significantly many more mutations than random subnetworks
  • Resembles the ActiveModules approach that finds subnetworks with significantly up or down-regulated gene expression
  • The key differences in HOTNET:
    • We do not have measurements for all genes: gene expression vs DNA sequencing
    • Only a small number of genes may be mutated
    • Assess statistical significance of subnetwork

9 of 36

Key steps of HOTNET algorithm

  • Build an influence graph, which specifies the influence of one node over another
    • Graph diffusion
    • Build a network with the tested genes as well as their neighborhood
  • Find significantly mutated subnetworks (two options)
    • Set cover to include genes mutated in many samples
    • Enhanced influence model where the influence edges are weighted by the number of mutations
  • Test for the significance of the number of subnetworks of a particular size

10 of 36

HOTNET algorithm overview

Leiserson et al . 2014, Nature Genetics

Figure is from HOTNET2 but HOTNET has the same major steps

11 of 36

Diffusion kernel used in HOTNET

  • HOTNET uses a specific type of kernel called the heat diffusion kernel
  • L=D-A denotes the graph Laplacian, where A is the adjacency matrix and D is the degree matrix
  • Let γ denote the constant rate at which heat is lost at any node
    • E.g. this could be proportional to the mean degree
  • Lγ is L+γI
  • Let s be a source node
  • The influence of s on all n nodes at time t is denoted as

Influence of s on node 1

12 of 36

Diffusion kernel used in HOTNET

  • Rate at which diffusion occurs from a source node on the graph is given by

  • where bs be a unit vector which is 1 for node s and 0 otherwise
  • The influence on all nodes at steady-state is given by

  • Diffusion process is similar to the diffusion kernel used for ranking genes in GeneWanderer

13 of 36

Diffusion kernel used in HOTNET

2.5

-1

0

-1

0

-1

2.5

-1

0

0

0

-1

3.5

-1

-1

-1

0

-1

2.5

0

0

0

-1

0

1.5

a

b

c

d

e

a

b

c

d

e

a

b

c

d

e

 

 

 

1

0

0

0

0

a

b

c

d

e

 

0.8

0.7

0.6

0.55

0.2

a

b

c

d

e

0.25

-0.35

-0.65

0.025

0.3

a

b

c

d

e

 

14 of 36

Graph diffusion to downplay hub intermediate nodes

Mutations in a linear chain are more “interesting” than in a star graph

15 of 36

Computing the influence between vertex pairs

  • Assume we have two vertices u and v
  • Let i(u,v) be the influence from u to v
  • w(u,v) is the influence between u and v and is min[i(u,v),i(v,u)]
  • The influence graph is thus an n X n symmetric weighted graph, where n is the number of tested genes
  • Further prune by removing edges with weight w(u,v)<δ

16 of 36

Top Hat question

17 of 36

Key steps of HOTNET algorithm

  • Build an influence graph which specifies the influence of one node over another
    • Graph diffusion
    • Builds a network with the tested genes as well as their neighborhood
  • Find significantly mutated subnetworks (two ways)
    • Set cover to include genes mutated in many samples
    • Enhanced influence model where the influence edges are weighted by the number of mutations
  • Test for the significance of the number of subnetworks of a particular size

18 of 36

HOTNET’s maximal connected cover approach

  • Given
    • A weighted influence network G=(V,E)
    • T is a subset of V, comprising genes with a mutation
    • : the set of samples with mutation in gene vi
    • A subset size k
  • Do:
    • Find a subnetwork over gene subset C that covers a maximal number of mutated samples
    • Formally, we want to find a C={v1,.. vk} such the following is maximized:

19 of 36

Heuristic algorithm to find a maximal connected cover

  • Finding the maximal connected cover set is computationally difficult
  • A heuristic algorithm is used
  • Add a vertex v such it is connected to the current vertex set via a node u and maximizes the ratio of the number of new samples covered to the number of nodes between u and v

20 of 36

Heuristic algorithm to find a maximal connected cover

Exploration

Keep adding a neighbor that has the maximal coverage with fewest additional vertices

21 of 36

Key steps of HOTNET algorithm

  • Build an influence graph which specifies the influence of one node over another
    • Graph diffusion
    • Builds a network with the tested genes as well as their neighborhood
  • Find significantly mutated subnetworks (two ways)
    • Set cover to include genes mutated in many samples
    • Enhanced influence model where the influence edges are weighted by the number of mutations
  • Test for the significance of the number of subnetworks of a particular size

22 of 36

Enhanced influence model

  • The enhanced influence model was a more computationally efficient approach
  • Enhance the influence measure between genes by the number of mutations observed in each of these genes
  • Specifically, let vj and vk be two genes with a mutation

  • Remove edges with influence
  • Decompose the associated enhanced influence graph into connected components

Enhanced influence

Set of samples with mutation in vj

23 of 36

Top Hat question

24 of 36

Statistical analysis for determining significance of subnetworks

Null distribution of subnetworks

  • Assign mutations uniformly at random
  • Shuffle gene labels in mutation data (preserve mutation frequencies)
  • Assess significance of the total number of subnetworks with s or more genes

25 of 36

Application of HOTNET to cancer dataset

  • 453 mutations in 601 genes in 91 Glioblastoma (GBM) samples
  • 1,013 mutations in 623 genes in 189 samples of lung adenocarcinoma
  • Protein-protein interaction network with 18,796 genes and 37,107 edges

26 of 36

HOTNET recovers pathways relevant to cancer

27 of 36

Application of HOTNET of pan-cancer mutation analysis

  • Updated version HOTNET2 was applied to mutation profiles of samples from 12 different cancers
  • Dataset
    • 3,110 samples with mutations in 11,565 genes
    • Genes’ mutation frequency varied a lot: 1-1,291 samples

M. D. M. Leiserson, F. Vandin, H.-T. Wu, J. R. Dobson, J. V. Eldridge, J. L. Thomas, A. Papoutsaki, Y. Kim, B. Niu, M. McLellan, M. S. Lawrence, A. Gonzalez-Perez, D. Tamborero, Y. Cheng, G. A. Ryslik, N. Lopez-Bigas, G. Getz, L. Ding, and B. J. Raphael, "Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes," Nature Genetics, vol. 47, no. 2, pp. 106-114, Dec. 2014.

28 of 36

HOTNET versus HOTNET2 kernel

  • HotNet kernel

  • HotNet2 kernel

The HotNet2 kernel was specifically designed to further avoid “star” subnetworks

Rate of diffusing out

Fraction of heat that stays on a node

29 of 36

HOTNET versus HOTNET2 influence

Directed diffusion avoids star graphs

Require strongly connected components in directed graph

Leiserson et al . 2014, Nature Genetics

30 of 36

HOTNET2 for pan-cancer mutation analysis

Leiserson et al . 2014, Nature Genetics

Very hot genes

31 of 36

HOTNET2 for pan-cancer mutation analysis

Leiserson et al . 2014, Nature Genetics

HOTNET2 subnetworks include genes with a wide range of mutation frequencies

32 of 36

Overview of HOTNET2 results

Which cancer types?

Mutations were found in expected and new pathways:�PI3K, TP53, cohesin

33 of 36

SWI/SNF complex pathways identified by HOTNET2

Number of samples

Sixth most mutated HotNet2 subnetwork.

34 of 36

HOTNET summary

  • An algorithm to find significantly mutated subnetworks
  • Based on creating an “influence graph”, followed by identification of “interesting” subnetworks
  • Non-local, less sensitive to network hubs
  • Subgraph detection component could be also addressed using module detection algorithms

35 of 36

Network-based stratification of patient samples

Hofree et al. 2013, Nature Methods

Input: Patient tumor mutation profiles, skeleton network

Output: Patient groups

How: (1) Smooth mutation profile using network smoothing; (2) Use Non-negative Matrix Factorization to cluster samples

36 of 36

Network-based stratification of patient tumor samples

Uterine cancer

Ovarian cancer

NBS subtypes associated with different histological types

NBS subtypes associated with survival

Hofree et al., Nature Methods 2013