1 of 42

Lecture 15:

Data Structures (Graphs)

Sookyung Kim

2 of 42

Graphs

2

3 of 42

Definitions

  • Graph G = (V, E), where
    • V is a set of vertices (nodes), and
    • E is the set of edges.
  • Subgraph: a subset of a graph’s vertices and edges.
  • Two vertices are adjacent if they are joined by an edge.

A

B

D

C

E

A

D

C

3

4 of 42

Graph Examples

4

5 of 42

Graph Examples

5

6 of 42

Graph Examples

6

7 of 42

Definitions

  • Path: A sequence of connected edges
  • Cycle: A path whose starting vertex and ending vertex are the same
  • Simple path: A path that contains no cycle
  • Simple cycle: A cycle that contains no cycle in it

A

B

D

C

E

A

B

D

C

E

Simple path

Simple cycle

7

8 of 42

Definitions

  • Connected graph: Each pair of distinct vertices has a path between them
  • Complete graph: Each pair of distinct vertices has an edge between them

Connected graph

Disconnected graph

Complete graph

8

9 of 42

Definitions

  • Weighted graph vs. unweighted graph: whether edges have weights
  • Directed graph vs. undirected graph: whether edges have direction

Weighted graph

Directed graph

9

10 of 42

Definitions

  • These are not allowed in (regular) graphs:
    • Multigraph: more than one edges allowed for a same pair of nodes.
    • Self edge: an edge starts from and arrives at the same node.

Multigraph

Self edge

10

11 of 42

Tree vs. Graph?

  • Tree is a special case of graph, where
    • all nodes are connected, and
    • there is no cycle.

A

B

D

C

E

A

B

D

C

E

Any node can be the root.

11

12 of 42

Exercise

  • Tree vs. Graph?�
  • Connected vs. disconnected?�
  • Cyclic vs. acyclic?�
  • Directed vs. undirected?

Graph

Connected

Cyclic

Directed

12

13 of 42

Graph Representation

13

14 of 42

Adjacency Matrix

  • Adjacency matrix
    • A graph is represented by an N × N matrix (2D array), where matrix[i][j] is 1 if there is an edge between vertex i and vertex j, and 0 otherwise.
    • In case of a directed graph, matrix[i][j] is 1 if there is an edge from vertex i to vertex j.
    • In case of a weighted graph, matrix[i][j] has the weight value, instead of 1.�
  • This is analogous to the array-based implementation of other data structures.
    • We use fixed size of memory (corresponding to N × N), regardless of how many edges we have.
    • (+) random access is done at O(1).
    • (-) inefficient use of memory if the graph is sparsely connected.

14

15 of 42

Adjacency Matrix: Example

Directed graph

15

16 of 42

Adjacency Matrix: Example

Weighted undirected graph

16

17 of 42

Adjacency List

  • Adjacency list
    • A graph is represented by N linked lists where list[i] is the list of vertices that is adjacent to vertex i.
    • In case of weighted graph, the list also contains the weight values.�
  • This is analogous to the reference-based implementation of other data structures.
    • For each node, we use variable size of memory, depending on the number of edges connected from/to it.
    • (+) efficient use of memory if the graph is sparsely connected.
    • (-) need linear search for an edge. (Not a big problem if the graph is sparse enough.)

17

18 of 42

Adjacency List: Example

Directed graph

18

19 of 42

Adjacency List: Example

Weighted undirected graph

19

20 of 42

Adjacency List: Implementation

  • Can you modify this to
    • directed graph?
    • weighted graph?

class undirected_graph():

def __init__(self, nodes, edges):

self.v = nodes[:]

self.e = {}

for node in nodes:

self.e[node] = []

for (u, v) in edges:

self.e[v].append(u)

self.e[u].append(v)

v = ['a', 'b', 'c']

e = [('a', 'b'), ('b', 'c')]

graph = undirected_graph(v, e)

print(graph.e)

{'a': ['b'], 'b': ['a', 'c'], 'c': ['b']}

20

21 of 42

Google Interview Question

  • Design a Graph class.
    • What is the graph for? What data to be stored?
    • How big will be the graph?
    • Is the graph expected to be sparse or dense?
    • What operations are we going to use most frequently?
    • Is the node & edge likely static or dynamic?

21

22 of 42

Graph Traversal

22

23 of 42

Graph Traversal

  • Goal: Visiting all nodes once in the graph.
    • E.g., for searching a specific node
    • Caution: graphs may have cycles and may be disconnected. We still should visit all nodes, but should not visit the same node more than once, and finish once we visit all of them.

j

k

23

24 of 42

Depth First Search (DFS)

  • Start from an arbitrary node.
  • Visit any connected unvisited node from there.
  • If no more node to visit, backtrack to the previous one, and keep going.

j

k

This slide is best seen with animations.

24

25 of 42

Depth First Search (DFS)

  • To backtrack, we need to trace our footprints!
  • What data structure would work best?
    • We backtrack to the most recently visited previous node!
    • Last-in, First-out: Stack would work nicely here!

a

b

a b

c

a b c

Node visited

Stack (bottom to top)

d

a b c d

g

a b c d g

e

a b c d g e

(backtrack)

a b c d g

f

a b c d g f

(backtrack)

a b c d g

(backtrack)

a b c d

a

h

a b c d h

(backtrack)

a b c d

(backtrack)

a b c

(backtrack)

a b

(backtrack)

a

i

a i

(backtrack)

a

(backtrack)

(empty)

25

26 of 42

Depth First Search (DFS)

def dfs(self):

unvisited = self.v.copy()

stack = Stack()

while not unvisited.is_empty():

visit(unvisited[0]) # visit origin

stack.push(unvisited[0])

del unvisited[0]

while not stack.is_empty():

curr = stack.peek()

if there remains an unvisited city from curr:

next = select an unvisited city from curr

visit(next)

stack.push(next)

delete next from unvisited

else:

stack.pop() # backtracking

Do something when we visit the node.

(e.g., print, compare, …)

For each connected node from curr node (accessed by internal adjacency matrix or list), check if it is in the unvisited.

We need this loop to take care of disconnected nodes!

06 Graph & Search (BFS, DFS)

27 of 42

Time Complexity of DFS

  • At each step,
    • We visit a node x.
    • For the node x, we search the list of adjacent nodes.
    • If there is an unvisited adjacent node, move to there.
    • If not, backtrack to the previously visited node.�
  • How many such steps?�
  • In total:

O(1)

O(|V|)

O(1)

O(1)

With adj. matrix

O(|V|)

O(|V|2)

With adj. list

Total number of search�= number of edges (|E|)

O(|V| |E|)

27

28 of 42

Depth First Search (DFS)

28

29 of 42

Depth First Search (DFS)

j

k

Node visited

Stack (bottom to top)

29

30 of 42

Breadth First Search (BFS)

  • Start from an arbitrary node.
  • Visit all connected unvisited node from there.
  • In the next step, repeat the same thing on those nodes. Keep going!

This slide is best seen with animations.

j

k

1

2

3

4

5

6

7

8

1

30

31 of 42

Breadth First Search (BFS)

  • We keep track of the order to process: ①, ②, ③, …
  • What data structure would work best?
    • We process the waiting nodes in the order of our visit.
    • Fast-in, First-out: Queue would work nicely here!

a

a

b f i

b

f i c e

Node visited

Queue (front to back)

f

i c e g

i

c e g

c

e g d

e

g d

g

d

d

h

h

(empty)

31

32 of 42

Breadth First Search (BFS)

def bfs(self):

unvisited = self.v.copy()

queue = Queue()

while not unvisited.is_empty():

queue.enqueue(some unvisited node)

while not queue.is_empty():

curr = queue.dequeue()

visit(curr)

delete curr from unvisited�

for city in all unvisited cities connected from curr:

queue.enqueue(city)

Do something when we visit the node.

(e.g., print, compare, …)

For each connected node from curr node (accessed by internal adjacency matrix or list), check if it is in the unvisited.

We need this loop to take care of disconnected nodes!

06 Graph & Search (BFS, DFS)

33 of 42

Breadth First Search (BFS)

33

34 of 42

Breadth First Search (BFS)

j

k

Node visited

Queue (front to back)

34

35 of 42

DFS vs. BFS

  • DFS and BFS visit the nodes in different order.
    • BFS visits nodes closer to the origin first. With an unweighted graph, the path discovered by BFS is a shortest path from the origin to that node.
  • Following the arrows, we build a tree.
    • Recall that the tree is a special graph, connected without a cycle.

DFS

BFS

origin

35

36 of 42

Time Complexity of BFS

  • At each step,
    • Dequeue a node from the queue and visit it (call it x).
    • From the node x, enque all unvisited adjacent nodes.�
  • How many such steps?�
  • In total:

O(1)

O(|V|)

With adj. matrix

O(|V|)

O(|V|2)

With adj. list

Total number of search�= number of edges (|E|)

O(|V| |E|)

36

37 of 42

Homework

37

38 of 42

HW1: Implement DFS

38

39 of 42

HW2: Implement BFS

39

40 of 42

HW3: Number of islands

40

41 of 42

HW4: Word Ladder

41

42 of 42

HW5: Maximum length of Tree

42