1 of 59

Lecture 17: Disjoint Sets

CSE 373: Data Structures and Algorithms

CSE 373 2` SP – CHAMPION

1

2 of 59

Announcements

We have a special guest today!

Kevin Lin! Will talk about TAing!

  • P3 due Wednesday
  • P4 comes out Wednesday - due in 2 weeks on last week of class
    • last project!
    • implement Dijkstra 7 pts. + DP (the DP portion is completely extra credit) 8 pts
  • EX4 due Friday – EX5 out friday
    • only 2 exercises left….

3 of 59

Warmup

CSE 373 SP 18 - KASEY CHAMPION

3

KruskalMST(Graph G)

initialize each vertex to be an independent component

sort the edges by weight

foreach(edge (u, v) in sorted order){

if(u and v are in different components){

add (u,v) to the MST

update u and v to be in the same component

}

}

Run Kruskal’s algorithm on the following graph to find the MST (minimum spanning tree) of the graph below.

Below is the provided pseudocode for Kruksal’s algorithm to choose all the edges.

No participation Poll Today.

4 of 59

Minimum Spanning Trees

CSE 373 20 SP – CHAMPION & CHUN

4

5 of 59

Review Minimum Spanning Trees (MSTs)

  • A Minimum Spanning Tree for a graph is a set of that graph’s edges that connect all of that graph’s vertices (spanning) while minimizing the total weight of the set (minimum)
    • Note: does NOT necessarily minimize the path from each vertex to every other vertex
    • Any tree with V vertices will have V-1 edges
    • A separate entity from the graph itself! More of an “annotation” applied to the graph, just like a Shortest Paths Tree (SPT)

A

B

D

E

C

3

6

11

1

4

5

8

9

10

7

2

F

Minimum Spanning Tree

6 of 59

Review Why do MST Algorithms Work?

  • Two useful properties for MST edges. We can think about them from either perspective:
    • Cycle Property: The heaviest edge along a cycle is NEVER part of an MST.
    • Cut Property: Split the vertices of the graph into any two sets A and B. The lightest edge between A and B is ALWAYS part of an MST. (Prim’s thinks this way)
  • Whenever you add an edge to a tree you create exactly one cycle. Removing any edge from that cycle gives another tree!
  • This observation, combined with the cycle and cut properties form the basis of all of the greedy algorithms for MSTs.
    • greedy algorithm: chooses best known option at each point and commits, rather than waiting for a global view of the graph before deciding

7 of 59

Review Adapting Dijkstra’s: Prim’s Algorithm

dijkstraShortestPath(G graph, V start)

Map edgeTo, distTo;

initialize distTo with all nodes mapped to ∞, except start to 0

PriorityQueue<V> perimeter; perimeter.add(start);

while (!perimeter.isEmpty()):

u = perimeter.removeMin()

known.add(u)

for each edge (u,v) to unknown v with weight w:

oldDist = distTo.get(v) // previous smallest edge to v

newDist = distTo.get(u) + w // is this a smaller edge to v?

if (newDist < oldDist):

distTo.put(u, newDist)

edgeTo.put(u, v)

if (perimeter.contains(v)):

perimeter.changePriority(v, newDist)

else:

perimeter.add(v, newDist)

distTo.get(u) +

prims

  • Normally, Dijkstra’s checks for a shorter path from the start.
  • But MSTs don’t care about individual paths, only the overall weight!
  • New condition: “would this be a smaller edge to connect the current known set to the rest of the graph?”

X

KNOWN

3??

3

1

A

1

C

1??

B

4

8 of 59

A Different Approach

  • Suppose the MST on the right was produced by Prim’s
  • Observation: We basically choose all the smallest edges in the entire graph (1, 2, 3, 4, 6)
    • The only exception was 5. Why shouldn’t we add edge 5?
    • Because adding 5 would create a cycle, and to connect A, C, & D we’d rather choose 1 & 4 than 1 & 5 or 4 & 5.
  • Prim’s thinks “vertex by vertex”, but what if you think “edge by edge” instead?
    • Start with the smallest edge in the entire graph and work your way up
    • Add the edge to the MST as long as it connects two new groups (meaning don’t add any edges that would create a cycle)

A

B

D

E

C

3

6

11

1

4

5

8

9

10

7

2

F

Building an MST “edge by edge” in this graph:

  • Add edge 1
  • Add edge 2
  • Add edge 3
  • Add edge 4
  • Skip edge 5 (would create a cycle)
  • Add edge 6
  • Finished: all vertices in the MST!

9 of 59

Kruskal’s Algorithm

  • This “edge by edge” approach is how Kruskal’s Algorithm works!�

kruskalMST(G graph)

Set(?) msts; Set finalMST;

initialize msts with each vertex as single-element MST

sort all edges by weight (smallest to largest)

for each edge (u,v) in ascending order:

uMST = msts.find(u)

vMST = msts.find(v)

if (uMST != vMST):

finalMST.add(edge (u, v))

msts.union(uMST, vMST)

  • Key Intuition: Kruskal’s keeps track of isolated “islands” of vertices (each is a sub-MST)
    • Start with each vertex as its own “island”
    • If an edge connects two vertices within the same “island”, it forms a cycle! Discard it.
    • If an edge connects two vertices in different “islands”, add it to the MST! Now those “islands” need to be combined.

A

B

D

E

C

4

2

11

1

3

5

8

9

10

7

6

F

“islands”

10 of 59

Kruskal’s Algorithm

  • This “edge by edge” approach is how Kruskal’s Algorithm works!�

kruskalMST(G graph)

Set(?) msts; Set finalMST;

initialize msts with each vertex as single-element MST

sort all edges by weight (smallest to largest)

for each edge (u,v) in ascending order:

uMST = msts.find(u)

vMST = msts.find(v)

if (uMST != vMST):

finalMST.add(edge (u, v))

msts.union(uMST, vMST)

  • Key Intuition: Kruskal’s keeps track of isolated “islands” of vertices (each is a sub-MST)
    • Start with each vertex as its own “island”
    • If an edge connects two vertices within the same “island”, it forms a cycle! Discard it.
    • If an edge connects two vertices in different “islands”, add it to the MST! Now those “islands” need to be combined.

A

B

D

E

C

4

2

11

1

3

5

8

9

10

7

6

F

“islands”

11 of 59

Kruskal’s Algorithm

  • This “edge by edge” approach is how Kruskal’s Algorithm works!�

kruskalMST(G graph)

Set(?) msts; Set finalMST;

initialize msts with each vertex as single-element MST

sort all edges by weight (smallest to largest)

for each edge (u,v) in ascending order:

uMST = msts.find(u)

vMST = msts.find(v)

if (uMST != vMST):

finalMST.add(edge (u, v))

msts.union(uMST, vMST)

  • Key Intuition: Kruskal’s keeps track of isolated “islands” of vertices (each is a sub-MST)
    • Start with each vertex as its own “island”
    • If an edge connects two vertices within the same “island”, it forms a cycle! Discard it.
    • If an edge connects two vertices in different “islands”, add it to the MST! Now those “islands” need to be combined.

A

B

D

E

C

4

2

11

1

3

5

8

9

10

7

6

F

“islands”

12 of 59

Kruskal’s Algorithm

  • This “edge by edge” approach is how Kruskal’s Algorithm works!�

kruskalMST(G graph)

Set(?) msts; Set finalMST;

initialize msts with each vertex as single-element MST

sort all edges by weight (smallest to largest)

for each edge (u,v) in ascending order:

uMST = msts.find(u)

vMST = msts.find(v)

if (uMST != vMST):

finalMST.add(edge (u, v))

msts.union(uMST, vMST)

  • Key Intuition: Kruskal’s keeps track of isolated “islands” of vertices (each is a sub-MST)
    • Start with each vertex as its own “island”
    • If an edge connects two vertices within the same “island”, it forms a cycle! Discard it.
    • If an edge connects two vertices in different “islands”, add it to the MST! Now those “islands” need to be combined.

A

B

D

E

C

4

2

11

1

3

5

8

9

10

7

6

F

“islands”

13 of 59

Kruskal’s Algorithm

  • This “edge by edge” approach is how Kruskal’s Algorithm works!�

kruskalMST(G graph)

Set(?) msts; Set finalMST;

initialize msts with each vertex as single-element MST

sort all edges by weight (smallest to largest)

for each edge (u,v) in ascending order:

uMST = msts.find(u)

vMST = msts.find(v)

if (uMST != vMST):

finalMST.add(edge (u, v))

msts.union(uMST, vMST)

  • Key Intuition: Kruskal’s keeps track of isolated “islands” of vertices (each is a sub-MST)
    • Start with each vertex as its own “island”
    • If an edge connects two vertices within the same “island”, it forms a cycle! Discard it.
    • If an edge connects two vertices in different “islands”, add it to the MST! Now those “islands” need to be combined.

A

B

D

E

C

2

11

1

3

5

8

9

10

7

6

F

“islands”

4

14 of 59

Kruskal’s Algorithm

  • This “edge by edge” approach is how Kruskal’s Algorithm works!�

kruskalMST(G graph)

Set(?) msts; Set finalMST;

initialize msts with each vertex as single-element MST

sort all edges by weight (smallest to largest)

for each edge (u,v) in ascending order:

uMST = msts.find(u)

vMST = msts.find(v)

if (uMST != vMST):

finalMST.add(edge (u, v))

msts.union(uMST, vMST)

  • Key Intuition: Kruskal’s keeps track of isolated “islands” of vertices (each is a sub-MST)
    • Start with each vertex as its own “island”
    • If an edge connects two vertices within the same “island”, it forms a cycle! Discard it.
    • If an edge connects two vertices in different “islands”, add it to the MST! Now those “islands” need to be combined.

A

B

D

E

C

4

2

11

1

3

5

8

9

10

7

6

F

“islands”

15 of 59

Prim’s Demos and Visualizations

Dijkstra’s Algorithm

Dijkstra’s proceeds radially from its source, because it chooses edges by path length from source

Prim’s Algorithm

Prim’s jumps around the graph (the perimeter), because it chooses edges by edge weight (there’s no source)

16 of 59

Kruskal’ Demos and Visualizations

Kruskal’s Algorithm

Kruskal’s jumps around the entire graph, because it chooses from all edges purely by edge weight (while preventing cycles)

Prim’s Algorithm

Prim’s jumps around the graph (the perimeter), because it chooses edges by edge weight (there’s no source)

17 of 59

Disjoint Sets

CSE 373 20 SP – CHAMPION & CHUN

17

18 of 59

Selecting an ADT

  • Kruskal’s needs to find what MST a vertex belongs to, and union those MSTs together
    • Our existing ADTs don’t lend themselves well to “unioning” two sets…
    • Let’s define a new one!

A

B

D

E

C

4

2

11

1

3

5

8

9

10

7

6

F

kruskalMST(G graph)

Set(?) msts; Set finalMST;

initialize msts with each vertex as single-element MST

sort all edges by weight (smallest to largest)

for each edge (u,v) in ascending order:

uMST = msts.find(u)

vMST = msts.find(v)

if (uMST != vMST):

finalMST.add(edge (u, v))

msts.union(uMST, vMST)

19 of 59

Disjoint Sets ADT (aka “Union-Find”)

  • Kruskal’s will use a Disjoint Sets ADT under the hood
    • Conceptually, a single instance of this ADT contains a “family” of sets that are disjoint (no element belongs to multiple sets)

DISJOINT SETS ADT

State

Family of Sets

  • disjoint: no shared elements
  • each set has a representative (either a member or a unique ID)

Behavior

makeSet(value) – new set with value as only member (and representative)

find(value) – return representative of the set containing value

union(x, y) – combine sets containing x and y into one set with all elements, choose single new representative

kruskalMST(G graph)

DisjointSets<V> msts; Set finalMST;

initialize msts with each vertex as single-element MST

sort all edges by weight (smallest to largest)

for each edge (u,v) in ascending order:

uMST = msts.find(u)

vMST = msts.find(v)

if (uMST != vMST):

finalMST.add(edge (u, v))

msts.union(uMST, vMST);

A

B

D

E

C

4

2

11

1

3

5

8

9

10

7

6

F

20 of 59

Disjoint Sets in mathematics

  • - “In mathematics, two sets are said to be disjoint sets if they have no element in common.” - Wikipedia
  • - disjoint = not overlapping

CSE 373 SP 18 - KASEY CHAMPION

20

Kevin

Aileen

Keanu

Sherdil

Leona

These two sets are disjoint sets

Nishu

Santino

Brian

These two sets are not disjoint sets

Santino

Set #1

Set #2

Set #3

Set #4

21 of 59

Disjoint Sets in computer science

  • In computer science, disjointsets can refer to this ADT/data structure that keeps track of the multiple “mini” sets that are disjoint (confusing naming, I know)

CSE 373 SP 18 - KASEY CHAMPION

21

Kevin

Aileen

Keanu

Sherdil

Leona

Set #1

Set #2

This overall grey blob thing is the actual

disjoint sets, and it’s keeping track of any number of mini-sets, which are all disjoint (the mini sets have no overlapping values).

Note: this might feel really different than ADTs we’ve run into before. The ADTs we’ve seen before (dictionaries, lists, sets, etc.) just store values directly.

But the Disjoint Set ADT is particularly interested in

letting you group your values into sets and

keep track of which particular set your values are in.

new ADT!

22 of 59

DisjointSets ADT methods

  • Just 3 methods (and makeSet is pretty simple!)

  • - findSet(value)
  • - union(valueA, valueB)
  • - makeSet(value)

CSE 373 SP 18 - KASEY CHAMPION

22

23 of 59

findSet(value)

  • findSet(value) returns some ID for which particular set the value is in. For Disjoint Sets, we often call this the representative (as it’s a value that represents the whole set).

  • Examples:
  • findSet(Brian)
  • findSet(Sherdil)
  • findSet(Velocity)
  • findSet(Kevin) == findSet(Aileen)

CSE 373 SP 18 - KASEY CHAMPION

23

Kevin

Aileen

Keanu

Sherdil

Velocity

Set #1

Set #2

Brian

Set #3

Set #4

Keanu

Kasey

3

3

2

2

true

24 of 59

union(valueA, valueB)

  • union(valueA, valueB) merges the set that A is in with the set that B is in. (basically add the two sets together into one)
  • Example: union(Blarry,Brian)

CSE 373 SP 18 - KASEY CHAMPION

24

Set #1

Set #3

Kevin

Vivian

Blarry

Sherdil

Velocity

Set #2

Brian

Set #4

Keanu

Kasey

Set #1

Kevin

Vivian

Blarry

Sherdil

Velocity

Set #2

Set #4

Kasey

Brian

Keanu

25 of 59

makeSet(value)

  • makeSet(value) makes a new mini set that just has the value parameter in it.

  • Examples:
  • makeSet(Elena)
  • makeSet(Anish)

CSE 373 SP 18 - KASEY CHAMPION

25

Kevin

Vivian

Blarry

Sherdil

Velocity

Set #1

Set #2

Brian

Set #3

Set #4

Keanu

Kasey

Elena

Set #5

Anish

Set #6

26 of 59

Disjoint Sets ADT Summary

CSE 373 SP 18 - KASEY CHAMPION

26

Disjoint-Sets ADT

makeSet(value) – creates a new set within the disjoint set where the only member is the value. Picks id/representative for set

state

behavior

Set of Sets

  • Mini sets are disjoint: Elements must be unique across mini sets
  • No required order
  • Each set has id/representative

findSet(value) – looks up the set containing the value, returns id/representative/ of that set

union(x, y) – looks up set containing x and set containing y, combines two sets into one. All of the values of one set are added to the other, and the now empty set goes away.

27 of 59

New ADT

CSE 373 SP 18 - KASEY CHAMPION

27

Set ADT

create(x) - creates a new set with a single member, x

Count of Elements

state

behavior

Set of elements

  • Elements must be unique!
  • No required order

add(x) - adds x into set if it is unique, otherwise add is ignored

remove(x) – removes x from set

size() – returns current number of elements in set

Disjoint-Set ADT

makeSet(x) – creates a new set within the disjoint set where the only member is x. Picks representative for set

Count of Sets

state

behavior

Set of Sets

  • Disjoint: Elements must be unique across sets
  • No required order
  • Each set has representative

findSet(x) – looks up the set containing element x, returns representative of that set

union(x, y) – looks up set containing x and set containing y, combines two sets into one. Picks new representative for resulting set

D

B

C

A

D

C

F

B

A

G

H

28 of 59

Example

  • new()
  • makeSet(a)
  • makeSet(b)
  • makeSet(c)
  • makeSet(d)
  • makeSet(e)
  • findSet(a)
  • findSet(d)
  • union(a, c)

CSE 373 WI 18 – MICHAEL LEE

28

c

Rep: 2

e

Rep: 4

b

Rep: 1

d

Rep: 3

a

Rep: 0

29 of 59

Example

CSE 373 WI 18 – MICHAEL LEE

29

c

e

Rep: 4

b

Rep: 1

d

Rep: 3

a

Rep: 0

  • new()
  • makeSet(a)
  • makeSet(b)
  • makeSet(c)
  • makeSet(d)
  • makeSet(e)
  • findSet(a)
  • findSet(d)
  • union(a, c)
  • union(b, d)

30 of 59

Example

CSE 373 WI 18 – MICHAEL LEE

30

c

e

Rep: 4

b

Rep: 1

d

a

Rep: 0

findSet(a) == findSet(c)

findSet(a) == findSet(d)

  • new()
  • makeSet(a)
  • makeSet(b)
  • makeSet(c)
  • makeSet(d)
  • makeSet(e)
  • findSet(a)
  • findSet(d)
  • union(a, c)
  • union(b, d)

31 of 59

Implementation

CSE 373 SP 18 - KASEY CHAMPION

31

TreeDisjointSet<E>

makeSet(x)-create a new tree of size 1 and add to our forest

state

behavior

Collection<TreeSet> forest

findSet(x)-locates node with x and moves up tree to find root

union(x, y)-append tree with y as a child of tree with x

Disjoint-Set ADT

makeSet(x) – creates a new set within the disjoint set where the only member is x. Picks representative for set

Count of Sets

state

behavior

Set of Sets

  • Disjoint: Elements must be unique across sets
  • No required order
  • Each set has representative

findSet(x) – looks up the set containing element x, returns representative of that set

union(x, y) – looks up set containing x and set containing y, combines two sets into one. Picks new representative for resulting set

Dictionary<NodeValues, NodeLocations> nodeInventory

TreeSet<E>

TreeSet(x)

state

behavior

SetNode overallRoot

add(x)

remove(x, y)

getRep()-returns data of overallRoot

SetNode<E>

SetNode(x)

state

behavior

E data

addChild(x)

removeChild(x, y)

Collection<SetNode> children

32 of 59

Implement makeSet(x)

  • Worst case runtime?
  • O(1)

CSE 373 SP 18 - KASEY CHAMPION

32

TreeDisjointSet<E>

makeSet(x)-create a new tree of size 1 and add to our forest

state

behavior

Collection<TreeSet> forest

findSet(x)-locates node with x and moves up tree to find root

union(x, y)-append tree with y as a child of tree with x

Dictionary<NodeValues, NodeLocations> nodeInventory

0

1

2

3

4

5

forest

0

1

2

3

4

5

  • makeSet(0)
  • makeSet(1)
  • makeSet(2)
  • makeSet(3)
  • makeSet(4)
  • makeSet(5)

33 of 59

QuickUnion Data Structure

  • Fundamental idea:
    • QuickFind tracks each element’s ID
    • QuickUnion tracks each element’s parent. Only the root has an ID!
      • Each set becomes tree-like, but something slightly different called an up-tree: store pointers from children to parents!

Joyce, Sam, Ken, Alex

Aileen, Santino

Paul

Aileen (1)

Santino

Paul (3)

Joyce (2)

Ken

Sam

Alex

Abstract Idea of “Disjoint Sets”

Implementation using QuickUnion

=

34 of 59

QuickUnion: Find

  • Key idea: can travel upward from any node to find its representative ID
  • How do we jump to a node quickly?
    • Also store a map from value to its node (Omitted in future slides)

find(Santino) -> 1

find(Ken) -> 2

find(Santino) != find(Ken)

find(Santino) == find(Aileen)

find(Ken):

jump to Ken node

travel upward until root

return ID

Aileen (1)

Santino

Paul (3)

Joyce (2)

Ken

Sam

Alex

Sam

Alex

Paul

35 of 59

QuickUnion: Union

  • Key idea: easy to simply rearrange pointers to union entire trees together!
  • Which of these implementations would you prefer?

union(Ken, Santino):

rootS = find(Santino)

set Ken to point to rootS

union(Ken, Santino):

rootK = find(Ken)

rootS = find(Santino)

set rootK to point to rootS

Aileen (1)

Santino

Paul (3)

Joyce (2)

Ken

Sam

Alex

Aileen (1)

Santino

Paul (3)

Joyce

Ken

Sam

Alex

RESULT:

Aileen (1)

Santino

Paul (3)

Joyce (2)

Ken

Sam

Alex

36 of 59

QuickUnion: Union

union(Ken, Santino):

rootS = find(Santino)

set Ken to point to rootS

union(Ken, Santino):

rootK = find(Ken)

rootS = find(Santino)

set rootK to point to rootS

Aileen (1)

Santino

Paul (3)

Joyce (2)

Ken

Sam

Alex

Aileen (1)

Santino

Paul (3)

Joyce

Ken

Sam

Alex

RESULT:

  • We prefer the right implementation because by changing just the root, we effectively pull the entire tree into the new set!
    • If we change the first node instead, we have to do more work for the rest of the old tree
    • A rare example of constant time work manipulating a factor of n elements

37 of 59

QuickUnion: Why bother with the second root?

  • Key idea: will help minimize runtime for future find() calls if we keep the height of the tree short!
    • Pointing directly to the second element would make the tree taller

union(Ken, Santino):

rootK = find(Ken)

rootS = find(Santino)

set rootK to point to rootS

Aileen (1)

Santino

Paul (3)

Joyce

Ken

Sam

Alex

union(Ken, Santino):

rootK = find(Ken)

set rootK to point to Santino

Aileen (1)

Santino

Paul (3)

Joyce

Ken

Sam

Alex

Why not just use:

38 of 59

QuickUnion: Checking in on those runtimes

  •  

Maps to Sets

QuickFind

QuickUnion

makeSet(value)

𝚯(1)

𝚯(1)

𝚯(1)

findSet(value)

𝚯(n)

𝚯(1)

𝚯(n)

union(x, y)

𝚯(n)

𝚯(n)

𝚯(1)

union(A, B):

rootA = find(A)

rootB = find(B)

set rootA to point to rootB

kruskalMST(G graph)

DisjointSets<V> msts; Set finalMST;

initialize msts with each vertex as single-element MST

sort all edges by weight (smallest to largest)

for each edge (u,v) in ascending order:

uMST = msts.find(u)

vMST = msts.find(v)

if (uMST != vMST):

finalMST.add(edge (u, v))

msts.union(uMST, vMST);

*

*

39 of 59

QuickUnion: Let’s Build a Worst Case

union(A, B):

rootA = find(A)

rootB = find(B)

set rootA to point to rootB

find(A):

jump to A node

travel upward until root

return ID

Even with the ”use-the-roots” implementation of union, try to come up with a series of calls to union that would create a worst-case runtime for find on these Disjoint Sets:

A

B

C

D

40 of 59

QuickUnion: Let’s Build a Worst Case

union(A, B):

rootA = find(A)

rootB = find(B)

set rootA to point to rootB

find(A):

jump to A node

travel upward until root

return ID

Even with the ”use-the-roots” implementation of union, try to come up with a series of calls to union that would create a worst-case runtime for find on these Disjoint Sets:

A

B

C

D

union(A, B)

union(B, C)

union(C, D)

find(A)

B

A

C

D

41 of 59

Analyzing the QuickUnion Worst Case

  • How did we get a degenerate tree?
    • Even though pointing a root to a root usually helps with this, we can still get a degenerate tree if we put the root of a large tree under the root of a small tree.
    • In QuickUnion, rootA always goes under rootB
      • But what if we could ensure the smaller tree went under the larger tree?

B

A

C

D

union(C, D)

B

A

C

D

What currently happens

What would help avoid degenerate tree

42 of 59

WeightedQuickUnion

  • Goal: Always pick the smaller tree to go under the larger tree
  • Implementation: Store the number of nodes (or “weight”) of each tree in the root
    • Constant-time lookup instead of having to traverse the entire tree to count

union(A, B):

rootA = find(A)

rootB = find(B)

put lighter root under heavier root

union(A, B)

union(B, C)

union(C, D)

find(A)

A

B

C

D

Now what happens?

B

A

C

D

Perfect! Best runtime we can get.

43 of 59

WeightedQuickUnion: Performance

  • union()’s runtime is still dependent on find()’s runtime, which is a function of the tree’s height
  • What’s the worst-case height for WeightedQuickUnion?

union(A, B):

rootA = find(A)

rootB = find(B)

put lighter root under heavier root

44 of 59

WeightedQuickUnion: Performance

  • Consider the worst case where the tree height grows as fast as possible

0

N

H

1

0

45 of 59

WeightedQuickUnion: Performance

  • Consider the worst case where the tree height grows as fast as possible

0

1

N

H

1

0

2

1

46 of 59

WeightedQuickUnion: Performance

  • Consider the worst case where the tree height grows as fast as possible

0

1

2

3

N

H

1

0

2

1

4

?

47 of 59

WeightedQuickUnion: Performance

  • Consider the worst case where the tree height grows as fast as possible

0

1

2

3

N

H

1

0

2

1

4

2

48 of 59

WeightedQuickUnion: Performance

  • Consider the worst case where the tree height grows as fast as possible

0

1

2

3

4

5

6

7

N

H

1

0

2

1

4

2

8

?

49 of 59

WeightedQuickUnion: Performance

  • Consider the worst case where the tree height grows as fast as possible

0

1

2

3

N

H

1

0

2

1

4

2

8

3

4

5

6

7

50 of 59

WeightedQuickUnion: Performance

0

1

2

3

N

H

1

0

2

1

4

2

8

3

16

4

4

5

6

7

8

9

10

11

12

13

14

15

  • Consider the worst case where the tree height grows as fast as possible
  • Worst case tree height is Θ(log N)

51 of 59

Why Weights Instead of Heights?

  • We used the number of items in a tree to decide upon the root�
  • Why not use the height of the tree?
    • HeightedQuickUnion’s runtime is asymptotically the same: Θ(log(n))
    • It’s easier to track weights than heights, even though WeightedQuickUnion can lead to some suboptimal structures like this one:

1

2

0

4

6

5

3

8

9

7

+

1

2

0

4

6

5

3

8

9

7

52 of 59

WeightedQuickUnion Runtime

  • This is pretty good! But there’s one final optimization we can make: path compression����

Maps to Sets

QuickFind

QuickUnion

WeightedQuickUnion

makeSet(value)

𝚯(1)

𝚯(1)

𝚯(1)

𝚯(1)

find(value)

𝚯(n)

𝚯(1)

𝚯(n)

𝚯(log n)

union(x, y)

assuming root args

𝚯(n)

𝚯(n)

𝚯(1)

𝚯(1)

union(x, y)

𝚯(n)

𝚯(n)

𝚯(n)

𝚯(log n)

Aileen

Joyce

Santino

Sam

Ken

1

2

2

2

1

B

A

C

D

1

2

3

4

5

6

53 of 59

Modifying Data Structures for Future Gains

  • Thus far, the modifications we’ve studied are designed to preserve invariants
    • E.g. Performing rotations to preserve the AVL invariant
    • We rely on those invariants always being true so every call is fast

  • Path compression is entirely different: we are modifying the tree structure to improve future performance
    • Not adhering to a specific invariant
    • The first call may be slow, but will optimize so future calls can be fast

54 of 59

Path Compression: Idea

  • This is the worst-case topology if we use WeightedQuickUnion�������
  • Idea: When we do find(15), move all visited nodes under the root
    • Additional cost is insignificant (we already have to visit those nodes, just constant time work to point to root too)

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

55 of 59

Path Compression: Idea

  • This is the worst-case topology if we use WeightedQuickUnion�������
  • Idea: When we do find(15), move all visited nodes under the root
    • Additional cost is insignificant (we already have to visit those nodes, just constant time work to point to root too)

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

  • Perform Path Compression on every find(), so future calls to find() are faster!

56 of 59

Path Compression: Details and Runtime

  • Run path compression on every find()!
    • Including the find()s that are invoked as part of a union()����

  • Understanding the performance of M>1 operations requires amortized analysis
    • Effectively averaging out rare events over many common ones
    • Typically used for “In-Practice” case
      • E.g. when we assume an array doesn’t resize “in practice”, we can do that because the rare resizing calls are amortized over many faster calls
    • In 373 we don’t go in-depth on amortized analysis

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

57 of 59

Path Compression: Runtime

  • M find()s on WeightedQuickUnion requires takes Θ(M log N)����

  • … but M find()s on WeightedQuickUnionWithPathCompression takes O(M log*N)!
    • log*n is the “iterated log”: the number of times you need to apply log to n before it’s <= 1
    • Note: log* is a loose bound�

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

58 of 59

Path Compression: Runtime

  • Path compression results in find()s and union()s that are very very close to (amortized) constant time
    • log* is less than 5 for any realistic input
    • If M find()s/union()s on N nodes is O(M log*N)�and log*N ≈ 5, then find()/union()s amortizes�to O(1)! 🤯

N

log* N

1

0

2

1

4

2

16

3

65536

4

265536

5

216

Number of atoms in the known universe is 2256ish

59 of 59

Appendix