1 of 77

Introduction to graph theory and molecular networks

BMI 826-23�Fall 2021

Sushmita Roy

https://compnetbiocourse.discovery.wisc.edu

Sep 14th 2021

Some of these materials are from Introduction to Bioinformatics, BMI/CS 576.

2 of 77

Goals for today

  • Introductory Graph theory
  • Molecules of life
  • Different types of molecular networks

3 of 77

Networks are ubiquitous

Internet

Image credit: Wikipedia, Wikimedia, The cellmap, Euroscientist, https://extensionaus.com.au/extension-practice/social-network-analysis/

Yeast genetic interaction network

Social network

Networks are powerful representations of complex systems

4 of 77

What is a network?

  • Describes connectivity patterns between the parts of a system
    • Vertex/Node: part, component
    • Edge/Interaction/link: relationship
  • Edges can have signs, directions, and/or weight
  • A network is represented as a graph
    • Node and vertex are used interchangeably
    • Edge, link, and interaction are used interchangeably

Vertex/Node

Edge

C

B

A

D

G

H

E

F

5 of 77

Notation

  • u, v, vi, s, t: A vertex of a graph
  • G: A graph defined by a tuple (V, E)
  • V: set of vertices, {v1,.., vN} where N is the number of nodes
  • E: set of edges, each edge is defined by a pair (vi, vj) representing a link between these two vertices
  • For an edge (vi, vj), vj is said to be adjacent to vi

6 of 77

A few graph-theoretic concepts

  • Representing a graph
    • Directed/Undirected/Weighted graph
  • Connectivity on a graph
  • Subnetworks/subgraphs
  • Traversal on a graph

7 of 77

Representing a graph

  • Adjacency matrix

  • Adjacency list

8 of 77

Undirected graphs

Adapted from Barabasi & Oltvai, Nature Reviews Genetics 2004

C

B

A

D

G

H

E

F

E= {(A,B),(A,C),(A,D),(A,E),(A,H),

(B,C),(B,G),

(E,F),

(F,G),

(G,H)}

V={A, B, C, D, E, F, G, H}

9 of 77

Adjacency matrix

C

Adapted from Barabasi & Oltvai, Nature Reviews Genetics 2004

B

A

D

G

H

E

0

1

1

1

1

0

0

1

1

0

1

0

0

0

1

0

1

1

0

0

0

0

0

0

1

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

1

0

1

0

0

1

0

0

0

1

0

1

1

0

0

0

0

0

1

0

F

A

B

C

D

E

F

G

H

A

B

C

D

E

F

G

H

V={A, B, C, D, E, F, G, H}

E= {(A,B),(A,C),(A,D),(A,E),(A,H),(B,C),(B,G),(E,A),(E,F),(F,G),(G,H)}

10 of 77

Adjacency list

C

B

A

D

G

H

E

F

A

B

C

D

E

H

B

A

C

G

C

A

B

D

A

E

A

F

F

G

E

G

B

F

H

H

A

G

11 of 77

Adjacency list versus matrix

Adjacency list

  • Space efficient
  • Asking if there is an edge can be slow
  • Preferred when the graph is sparse

Adjacency matrix

  • Very fast to ask if there is an edge
  • Storage is N2, where N is the number of nodes in the graph
  • Preferred when the graph is dense

12 of 77

Directed graphs

C

B

A

D

G

H

E

F

  • Edges have directionality on them
  • Adjacency matrix is no longer symmetric

0

0

0

1

1

0

0

0

1

0

0

0

0

0

1

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

0

1

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

A

B

C

D

E

F

G

H

A

B

C

D

E

F

G

H

13 of 77

Weighted graphs

C

B

A

D

H

E

0

0

0

0.9

1.4

0

1.6

0

0

0

0

0

0.2

0.6

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

2

3.1

0

0

0

0

0

A

B

C

D

E

H

A

B

C

D

E

H

  • Edges have weights on them.
  • We can have directed and undirected weighted graphs.
  • The example shown is that of a directed weighted graph

0.6

1.6

0.2

3.1

1.4

0.9

2

14 of 77

Node degree

  • Undirected network
    • Degree, k: Number of neighbors of a node
  • Directed network
    • In degree, kin: Number of incoming edges
    • Out degree, kout: Number of outgoing edges

15 of 77

Node degree

  • Undirected network
    • Degree, k: Number of neighbors of a node
  • Directed network
    • In degree, kin: Number of incoming edges
    • Out degree, kout: Number of outgoing edges

C

B

A

D

G

H

E

F

  • In degree of B is 1
  • Out degree of A is 2

What is the out degree of E?

16 of 77

Paths and cycles

  • Path:
    • a path from vertex s to t in G is a sequence of vertices (v0,…,vk) such that s=v0 and t=vk and (vi,vi+1) are edges in E.
    • A path is simple if there are no repetitions of a vertex.
  • Reachable: A vertex t is reachable from vertex s if there is a path from s to t
  • Path length: The total number of edges in a path
  • Shortest path: The path between two vertices with the shortest path length
  • Cycle: A path where v0 and vk are the same

17 of 77

Paths and cycles

C

B

A

D

G

H

E

  • There are two paths from B to D
  • Which is the shortest path?

Paths from B to D

F

C

B

A

D

G

H

E

A cycle

F

cycle

Paths

18 of 77

Connected components

  • Connected components: The set of vertices that are reachable from one node to another
  • Strongly connected components: The set of vertices that are reachable from one vertex to another in a directed graph.
  • Connected graph: An undirected graph is connected if every pair of vertices is connected by a path
  • Strongly connected graph: A directed graph where all vertices are reachable from each other

19 of 77

Connected components

H

C

B

A

D

G

H

E

F

Two connected components

A

B

C

D

E

F

G

Four strongly connected components

Connected components in an undirected graph

Connected components in a directed graph

20 of 77

Special types of graphs

  • Complete graph: an undirected graph where all vertices are neighbors of each other
  • Bipartite graph: a graph G=(V,E) whose vertex set is divided into to sets, V1 and V2 such that for every edge (u,v) in E, u is in V1 and v is in V2
  • Directed acyclic graph: A directed graph that has no cycles
  • Tree: A graph where every pair of vertices is connected by a unique simple path

21 of 77

Subgraph

  • A graph G’=(V,’E’) is a subgraph of a graph G=(V,E) if V’V (V’ is a subset of V) and E’E.
  • Given a subset V’V, G’=(V’,E’) is a subgraph induced by V’ if E’={(u,v)∈ E; u, vV’}.
  • We will use subgraph and subnetwork interchangeably
  • Subgraphs we just saw:
    • Cycle, path, connected component

22 of 77

Common graph traversal algorithms

  • Breadth-first search
  • Depth-first search

23 of 77

Breadth-first search (BFS)

  • Given a graph G=(V,E) and a source vertex s, BFS explores G to
    • find every vertex that is reachable from s
    • Computes the shortest path length from s to all reachable vertices
  • BFS explores all vertices at a particular distance before the next
    • So it uniformly accesses all vertices across the “breadth” of the frontier

24 of 77

Breadth first search algorithm sketch

  • We will need three types of data structures
    • Adjacency list to store the graph
    • Three arrays: color color, distance d, predecessor π
    • A queue, Q, used for doing the traversal of nodes in a first in first out order
  • Color: A node is white, black or gray
    • White: as yet undiscovered
    • Black: all neighbors have been discovered
    • Gray: some neighbors may not have been discovered
  • Distance keeps track of the shortest path length
  • Predecessor is used to produce the path

25 of 77

Breadth first search algorithm

26 of 77

Breadth first search example

Adapted from Introduction to Algorithms, 2nd Edition, Cormen, Leiserson, Rivest, Stein

Q={s}

0

s

r

t

u

v

w

x

y

Before while loop

27 of 77

Breadth first search example

Adapted from Introduction to Algorithms, 2nd Edition, Cormen, Leiserson, Rivest, Stein

1

Q={w,r}

1

0

s

r

t

u

v

w

x

y

Iteration 1

28 of 77

Breadth first search continued

Iteration 2

1

1

2

2

0

s

r

t

u

v

w

x

y

Q={r,t,x}

29 of 77

Breadth first search continued

Q={t,x,v}

Iteration 3

1

2

1

2

2

0

s

r

t

u

v

w

x

y

30 of 77

Breadth first search continued

Q={x,v,u}

Iteration 4

1

2

1

2

2

3

0

s

r

t

u

v

w

x

y

31 of 77

Breadth first search continued

Q={v,u,y}

Iteration 5

1

2

1

2

2

3

3

0

s

r

t

u

v

w

x

y

32 of 77

Breadth first search continued

Q={u,y}

Iteration 6

1

2

1

2

2

3

3

0

s

r

t

u

v

w

x

y

33 of 77

Breadth first search continued

Q={y}

Iteration 7

1

2

1

2

2

3

3

0

s

r

t

u

v

w

x

y

34 of 77

Breadth first search continued

Q={}

Iteration 8

1

2

1

2

2

3

3

0

s

r

t

u

v

w

x

y

35 of 77

Depth first search

  • Searches deeper in the graph whenever possible
  • Edges are explored from the most recently discovered vertex with unexplored edges leaving it
  • DFS is used for ”topological sort” and to find “strongly connected components”
  • Like BFS, we need color, predecessor (𝜋)
  • Additionally stores start (d) and end time (f) of a node’s discovery

36 of 77

Depth first search algorithm

37 of 77

Depth first search example

v

u

w

Iteration 1

x

y

z

1/

Start time

Node

𝜋

u

NIL

v

NIL

w

NIL

x

NIL

y

NIL

z

NIL

Time=1

DFS-VISIT (u)

38 of 77

Depth first search example

x

y

z

1/

2/

v

u

w

Iteration 2

Node

𝜋

u

NIL

v

u

w

NIL

x

NIL

y

NIL

z

NIL

Time=2

DFS-VISIT (v)

39 of 77

Depth first search example

x

y

z

1/

2/

v

u

w

3/

Iteration 3

Node

𝜋

u

NIL

v

u

w

NIL

x

NIL

y

v

z

NIL

Time=3

DFS-VISIT (y)

40 of 77

Depth first search example

x

y

z

1/

2/

v

u

w

3/

Iteration 4

Node

𝜋

u

NIL

v

u

w

NIL

x

y

y

v

z

NIL

Time=4

4/

DFS-VISIT (x)

41 of 77

Depth first search example

Node

𝜋

u

NIL

v

u

w

NIL

x

y

y

v

z

NIL

Time=5

x

y

z

1/

2/

v

u

w

3/

Iteration 5

4/5

DFS-VISIT (x)

Start time

End time

42 of 77

Depth first search example

x

y

z

1/

2/

v

u

w

3/6

Iteration 6

Node

𝜋

u

NIL

v

u

w

NIL

x

y

y

v

z

NIL

Time=6

4/5

DFS-VISIT (y)

43 of 77

Depth first search example

x

y

z

1/

2/7

v

u

w

3/6

Iteration 7

Node

𝜋

u

NIL

v

u

w

NIL

x

y

y

v

z

NIL

Time=7

4/5

DFS-VISIT (v)

44 of 77

Depth first search example

x

y

z

1/8

2/7

v

u

w

3/6

Iteration 8

Node

𝜋

u

NIL

v

u

w

NIL

x

y

y

v

z

NIL

Time=8

4/5

DFS-VISIT (u)

45 of 77

Depth first search example

x

y

z

1/8

2/7

v

u

w

3/6

Iteration 9

Node

𝜋

u

NIL

v

u

w

NIL

x

y

y

v

z

w

Time=9

4/5

DFS-VISIT (w)

9/

46 of 77

Depth first search example

x

y

z

1/8

2/7

v

u

w

3/6

Iteration 10

Node

𝜋

u

NIL

v

u

w

NIL

x

y

y

v

z

w

Time=10

4/5

DFS-VISIT (z)

9/

10/

47 of 77

Depth first search example

x

y

z

1/8

2/7

v

u

w

3/6

Iteration 11

Node

𝜋

u

NIL

v

u

w

NIL

x

y

y

v

z

w

Time=11

4/5

DFS-VISIT (z)

9/

10/11

48 of 77

Depth first search example

x

y

z

1/8

2/7

v

u

w

3/6

Iteration 12

Node

𝜋

u

NIL

v

u

w

NIL

x

y

y

v

z

w

Time=12

4/5

DFS-VISIT (w)

9/12

10/11

49 of 77

Take away points

  • Adjacency lists and matrices are used to represent and analyze graphs
  • Definitions of paths, cycles, connected components
  • Graph traversal
    • Breadth-first search
    • Depth-first search

50 of 77

Goals for today

  • Introductory Graph theory
  • Molecules of life
  • Different types of molecular networks

51 of 77

Organization of biological information

Organism

Tissue

Gene

Chromosome

Cell

http://publications.nigms.nih.gov/thenewgenetics/chapter1.html;�Images: J Merkin et al. Science 2012;338:1593-1599, Wikipedia, Google, Shutterstock

Organs

52 of 77

Molecules of life

  • Deoxyribonucleic acid (DNA)
  • Ribonucleic acid (RNA)
    • Messenger RNA (mRNA)
      • Makes proteins
    • Non-coding RNA (ncRNA)
  • Proteins
  • Metabolites
  • While DNA is mostly static, RNA, proteins, metabolites change between cell types, tissues, environmental conditions

53 of 77

Deoxyribonucleic acid (DNA)

image from the DOE Human Genome Program

http://www.ornl.gov/hgmis

54 of 77

DNA is a double helical molecule

  • In 1953, James Watson and Francis Crick discovered DNA molecule has two strands arranged in a double helix
  • This was possible through the X-ray diffraction data from Maurice Wilkins and Rosalind Franklin

http://www.chemheritage.org/discover/online-resources/chemistry-in-history/themes/biomolecules/dna/watson-crick-wilkins-franklin.aspx

Watson and Crick

Maurice

Wilkins

Rosalind Franklin

55 of 77

Nucleotides

  • DNA is a polymer
  • Composed of repeating chemical units called nucleotides
  • Nucleotide
    • Nitrogen containing base
    • 5 carbon sugar: deoxyribose
    • Phosphate group
    • Phosphate-hydroxy bonds connect the�nucleotides
  • Four nucleotides make DNA
    • adenine (A), cytosine (C), guanine (G) and thymine (T)

Phosphate

Base

Sugar

Hydroxy

56 of 77

DNA stores the blue print of an organism

  • The heredity molecule
  • Has the information needed to make an organism
  • Double strandedness of the DNA molecule provides stability, prevents errors in copying
    • one strand has all the information

57 of 77

Chromosomes

  • All the DNA of an organism is divided up into individual chromosomes
  • Each chromosome is really a DNA molecule
  • Different organisms have different numbers of chromosomes

Image from www.genome.gov

58 of 77

Genes

  • A gene is a sequence of nucleotides which specifies a protein or RNA molecule
  • The human genome has ~ 25,000 protein-coding genes (still being revised)
  • One gene can have many functions
  • One function can require many genes

…GTATGTCTAAGCCTGAATTCAGAACGGCTTC…

59 of 77

RNA: Ribonucleic acid

  • RNA
    • Made up of repeating nucleotides
    • The sugar is ribose
    • U is used in place of T
  • A strand of RNA can be thought of as a string composed of the four letters: A, C, G, U
  • RNA is single stranded
    • More flexible than DNA
    • Can double back and form loops
    • Such structures can be more stable
  • Different types of RNA molecules
    • Transfer RNA (tRNA)
    • Ribosomal RNA (rRNA)
    • Messenger RNA (mRNA)
    • Micro RNA (miRNA)…

60 of 77

Proteins

  • Proteins are polymers too
  • The repeating units are amino acids
  • There are 20 different amino acids known
  • DNA sequence of a gene codes for a protein
  • Some types of proteins are transcription factors, metabolic enzymes, and signaling proteins

61 of 77

Amino Acids

62 of 77

The central dogma of Molecular biology

DNA

RNA

Proteins

Transcription

Translation

63 of 77

The genetic code: specifies how mRNA is translated into protein

Genetic code is degenerate

64 of 77

Transcription

  • In eukaryotes: happens inside the nucleus
  • RNA polymerase (RNA Pol) is an enzyme that builds an RNA strand from a gene
  • RNA Pol is recruited at specific parts of the genome in a condition-specific way.
  • Transcription factor proteins are assigned the task of RNA Pol recruitment.
  • RNA that is transcribed from a protein coding region is called messenger RNA (mRNA)

65 of 77

Translation

  • Process of turning mRNA into proteins.

  • Happens outside of the nucleus inside the cytoplasm in ribosomes

  • ribosomes are the machines that synthesize proteins from mRNA

66 of 77

A video about transcription and translation

67 of 77

Metabolites

  • Small molecules that are essential to living systems
    • Water, sugars, fat
  • Product (output) or substrate (input) of a metabolic reaction

68 of 77

Goals for today

  • Introductory Graph theory
  • Molecules of life
  • Different types of molecular networks

69 of 77

Graphs for representing molecular networks

  • Nodes are biological molecules
      • Genes, proteins, metabolites, etc
  • Edges represent interaction between molecules
  • Many different types of molecular networks exist
  • They vary based upon the node and edge semantics

70 of 77

Transcriptional regulatory networks

Regulatory network of E. coli.

153 TFs (green & light red), 1319 targets

Vargas and Santillan, 2008

A

B

Gene C

Transcription factors (TF)

C

A

B

  • Directed, signed, weighted graph
  • Nodes: TFs and Target genes
  • Edges: A regulates C’s expression level

DNA

71 of 77

Protein-protein interaction networks

Barabasi et al. 2003

Yeast protein interaction network

X

Y

X

Y

  • Undirected, may or may not be weighted graph
  • Nodes: Proteins
  • Edges: Protein X physically interacts with protein Y

72 of 77

Signaling networks

Sachs et al., 2005, Science

Receptors

P

Q

A

TF

A

P

Q

  • Directed graph
  • Nodes: Enzymes and other proteins
  • Edges: Enzyme P modifies protein Q

73 of 77

Reactions associated with Galactose metabolism

Metabolic networks

Metabolites

N

M

a

b

c

O

Enzymes

d

O

M

N

KEGG

  • Unweighted graph
  • Nodes: Metabolic enzyme
  • Edges: Enzymes M and N share a compound

74 of 77

Genetic interaction networks

Dixon et al., 2009, Annu. Rev. Genet

Q

G

Genetic interaction: If the phenotype of a double mutant is significantly different than each mutant alone

  • Undirected graph
  • Nodes: Genes
  • Edges: Genetic interaction between query gene Q and gene G

75 of 77

Categorization of molecular networks

  • Physical networks
    • Transcriptional regulatory networks: interactions between regulatory proteins (transcription factors) and genes
    • Protein-protein: interactions among proteins
    • Signaling networks: protein-protein and protein-small molecule interactions to relay signals from outside the cell to the nucleus
  • Functional networks
    • Metabolic: reactions through which enzymes convert substrates to products
    • Genetic: interactions among genes which when perturbed together produce a significant phenotype than when individually perturbed

76 of 77

“Omic” tools measure cellular molecular components in a high-throughput manner

Uwe Sauer, Matthias Heinemann, Nicola Zamboni, Science 2007

77 of 77

References

  • Cormen, Thomas H., Clifford Stein, Ronald L. Rivest, and Charles E. Leiserson. Introduction to Algorithms. 2nd edition. McGraw-Hill Higher Education, 2001.
  • Introductory lecture from Introduction to Bioinformatics, BMI/CS 576