Lecture 17: Disjoint Sets
CSE 373: Data Structures and Algorithms
CSE 373 2` SP – CHAMPION
1
Announcements
We have a special guest today!
Kevin Lin! Will talk about TAing!
Warmup
CSE 373 SP 18 - KASEY CHAMPION
3
KruskalMST(Graph G)
initialize each vertex to be an independent component
sort the edges by weight
foreach(edge (u, v) in sorted order){
if(u and v are in different components){
add (u,v) to the MST
update u and v to be in the same component
}
}
Run Kruskal’s algorithm on the following graph to find the MST (minimum spanning tree) of the graph below.
Below is the provided pseudocode for Kruksal’s algorithm to choose all the edges.
No participation Poll Today.
Minimum Spanning Trees
CSE 373 20 SP – CHAMPION & CHUN
4
Review Minimum Spanning Trees (MSTs)
A
B
D
E
C
3
6
11
1
4
5
8
9
10
7
2
F
Minimum Spanning Tree
Review Why do MST Algorithms Work?
Review Adapting Dijkstra’s: Prim’s Algorithm
dijkstraShortestPath(G graph, V start)
Map edgeTo, distTo;
initialize distTo with all nodes mapped to ∞, except start to 0
PriorityQueue<V> perimeter; perimeter.add(start);
while (!perimeter.isEmpty()):
u = perimeter.removeMin()
known.add(u)
for each edge (u,v) to unknown v with weight w:
oldDist = distTo.get(v) // previous smallest edge to v
newDist = distTo.get(u) + w // is this a smaller edge to v?
if (newDist < oldDist):
distTo.put(u, newDist)
edgeTo.put(u, v)
if (perimeter.contains(v)):
perimeter.changePriority(v, newDist)
else:
perimeter.add(v, newDist)
�
distTo.get(u) +
prims
X
KNOWN
3??
3
1
A
1
C
1??
B
4
A Different Approach
A
B
D
E
C
3
6
11
1
4
5
8
9
10
7
2
F
Building an MST “edge by edge” in this graph:
Kruskal’s Algorithm
kruskalMST(G graph)
Set(?) msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST)
A
B
D
E
C
4
2
11
1
3
5
8
9
10
7
6
F
“islands”
Kruskal’s Algorithm
kruskalMST(G graph)
Set(?) msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST)
A
B
D
E
C
4
2
11
1
3
5
8
9
10
7
6
F
“islands”
Kruskal’s Algorithm
kruskalMST(G graph)
Set(?) msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST)
A
B
D
E
C
4
2
11
1
3
5
8
9
10
7
6
F
“islands”
Kruskal’s Algorithm
kruskalMST(G graph)
Set(?) msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST)
A
B
D
E
C
4
2
11
1
3
5
8
9
10
7
6
F
“islands”
Kruskal’s Algorithm
kruskalMST(G graph)
Set(?) msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST)
A
B
D
E
C
2
11
1
3
5
8
9
10
7
6
F
“islands”
4
Kruskal’s Algorithm
kruskalMST(G graph)
Set(?) msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST)
A
B
D
E
C
4
2
11
1
3
5
8
9
10
7
6
F
“islands”
Prim’s Demos and Visualizations
Dijkstra’s Algorithm
Dijkstra’s proceeds radially from its source, because it chooses edges by path length from source
Prim’s Algorithm
Prim’s jumps around the graph (the perimeter), because it chooses edges by edge weight (there’s no source)
Kruskal’ Demos and Visualizations
Kruskal’s Algorithm
Kruskal’s jumps around the entire graph, because it chooses from all edges purely by edge weight (while preventing cycles)
Prim’s Algorithm
Prim’s jumps around the graph (the perimeter), because it chooses edges by edge weight (there’s no source)
Disjoint Sets
CSE 373 20 SP – CHAMPION & CHUN
17
Selecting an ADT
A
B
D
E
C
4
2
11
1
3
5
8
9
10
7
6
F
kruskalMST(G graph)
Set(?) msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST)
Disjoint Sets ADT (aka “Union-Find”)
DISJOINT SETS ADT
State
Family of Sets
Behavior
makeSet(value) – new set with value as only member (and representative)
find(value) – return representative of the set containing value
union(x, y) – combine sets containing x and y into one set with all elements, choose single new representative
kruskalMST(G graph)
DisjointSets<V> msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST);
A
B
D
E
C
4
2
11
1
3
5
8
9
10
7
6
F
Disjoint Sets in mathematics
CSE 373 SP 18 - KASEY CHAMPION
20
Kevin
Aileen
Keanu
Sherdil
Leona
These two sets are disjoint sets
Nishu
Santino
Brian
These two sets are not disjoint sets
Santino
Set #1
Set #2
Set #3
Set #4
Disjoint Sets in computer science
CSE 373 SP 18 - KASEY CHAMPION
21
Kevin
Aileen
Keanu
Sherdil
Leona
Set #1
Set #2
This overall grey blob thing is the actual
disjoint sets, and it’s keeping track of any number of mini-sets, which are all disjoint (the mini sets have no overlapping values).
Note: this might feel really different than ADTs we’ve run into before. The ADTs we’ve seen before (dictionaries, lists, sets, etc.) just store values directly.
But the Disjoint Set ADT is particularly interested in
letting you group your values into sets and
keep track of which particular set your values are in.
new ADT!
DisjointSets ADT methods
CSE 373 SP 18 - KASEY CHAMPION
22
findSet(value)
CSE 373 SP 18 - KASEY CHAMPION
23
Kevin
Aileen
Keanu
Sherdil
Velocity
Set #1
Set #2
Brian
Set #3
Set #4
Keanu
Kasey
3
3
2
2
true
union(valueA, valueB)
CSE 373 SP 18 - KASEY CHAMPION
24
Set #1
Set #3
Kevin
Vivian
Blarry
Sherdil
Velocity
Set #2
Brian
Set #4
Keanu
Kasey
Set #1
Kevin
Vivian
Blarry
Sherdil
Velocity
Set #2
Set #4
Kasey
Brian
Keanu
makeSet(value)
CSE 373 SP 18 - KASEY CHAMPION
25
Kevin
Vivian
Blarry
Sherdil
Velocity
Set #1
Set #2
Brian
Set #3
Set #4
Keanu
Kasey
Elena
Set #5
Anish
Set #6
Disjoint Sets ADT Summary
CSE 373 SP 18 - KASEY CHAMPION
26
Disjoint-Sets ADT
makeSet(value) – creates a new set within the disjoint set where the only member is the value. Picks id/representative for set
state
behavior
Set of Sets
findSet(value) – looks up the set containing the value, returns id/representative/ of that set
union(x, y) – looks up set containing x and set containing y, combines two sets into one. All of the values of one set are added to the other, and the now empty set goes away.
New ADT
CSE 373 SP 18 - KASEY CHAMPION
27
Set ADT
create(x) - creates a new set with a single member, x
Count of Elements
state
behavior
Set of elements
add(x) - adds x into set if it is unique, otherwise add is ignored
remove(x) – removes x from set
size() – returns current number of elements in set
Disjoint-Set ADT
makeSet(x) – creates a new set within the disjoint set where the only member is x. Picks representative for set
Count of Sets
state
behavior
Set of Sets
findSet(x) – looks up the set containing element x, returns representative of that set
union(x, y) – looks up set containing x and set containing y, combines two sets into one. Picks new representative for resulting set
D
B
C
A
D
C
F
B
A
G
H
Example
CSE 373 WI 18 – MICHAEL LEE
28
c
Rep: 2
e
Rep: 4
b
Rep: 1
d
Rep: 3
a
Rep: 0
Example
CSE 373 WI 18 – MICHAEL LEE
29
c
e
Rep: 4
b
Rep: 1
d
Rep: 3
a
Rep: 0
Example
CSE 373 WI 18 – MICHAEL LEE
30
c
e
Rep: 4
b
Rep: 1
d
a
Rep: 0
findSet(a) == findSet(c)
findSet(a) == findSet(d)
Implementation
CSE 373 SP 18 - KASEY CHAMPION
31
TreeDisjointSet<E>
makeSet(x)-create a new tree of size 1 and add to our forest
state
behavior
Collection<TreeSet> forest
findSet(x)-locates node with x and moves up tree to find root
union(x, y)-append tree with y as a child of tree with x
Disjoint-Set ADT
makeSet(x) – creates a new set within the disjoint set where the only member is x. Picks representative for set
Count of Sets
state
behavior
Set of Sets
findSet(x) – looks up the set containing element x, returns representative of that set
union(x, y) – looks up set containing x and set containing y, combines two sets into one. Picks new representative for resulting set
Dictionary<NodeValues, NodeLocations> nodeInventory
TreeSet<E>
TreeSet(x)
state
behavior
SetNode overallRoot
add(x)
remove(x, y)
getRep()-returns data of overallRoot
SetNode<E>
SetNode(x)
state
behavior
E data
addChild(x)
removeChild(x, y)
Collection<SetNode> children
Implement makeSet(x)
CSE 373 SP 18 - KASEY CHAMPION
32
TreeDisjointSet<E>
makeSet(x)-create a new tree of size 1 and add to our forest
state
behavior
Collection<TreeSet> forest
findSet(x)-locates node with x and moves up tree to find root
union(x, y)-append tree with y as a child of tree with x
Dictionary<NodeValues, NodeLocations> nodeInventory
0
1
2
3
4
5
forest
0 | 1 | 2 | 3 | 4 | 5 |
| | | | | |
QuickUnion Data Structure
Joyce, Sam, Ken, Alex
Aileen, Santino
Paul
Aileen (1)
Santino
Paul (3)
Joyce (2)
Ken
Sam
Alex
Abstract Idea of “Disjoint Sets”
Implementation using QuickUnion
=
QuickUnion: Find
find(Santino) -> 1
find(Ken) -> 2
find(Santino) != find(Ken)
find(Santino) == find(Aileen)
find(Ken):
jump to Ken node
travel upward until root
return ID
Aileen (1)
Santino
Paul (3)
Joyce (2)
Ken
Sam
Alex
Sam
Alex
Paul
…
QuickUnion: Union
union(Ken, Santino):
rootS = find(Santino)
set Ken to point to rootS
union(Ken, Santino):
rootK = find(Ken)
rootS = find(Santino)
set rootK to point to rootS
Aileen (1)
Santino
Paul (3)
Joyce (2)
Ken
Sam
Alex
Aileen (1)
Santino
Paul (3)
Joyce
Ken
Sam
Alex
RESULT:
Aileen (1)
Santino
Paul (3)
Joyce (2)
Ken
Sam
Alex
QuickUnion: Union
union(Ken, Santino):
rootS = find(Santino)
set Ken to point to rootS
union(Ken, Santino):
rootK = find(Ken)
rootS = find(Santino)
set rootK to point to rootS
Aileen (1)
Santino
Paul (3)
Joyce (2)
Ken
Sam
Alex
Aileen (1)
Santino
Paul (3)
Joyce
Ken
Sam
Alex
RESULT:
QuickUnion: Why bother with the second root?
union(Ken, Santino):
rootK = find(Ken)
rootS = find(Santino)
set rootK to point to rootS
Aileen (1)
Santino
Paul (3)
Joyce
Ken
Sam
Alex
union(Ken, Santino):
rootK = find(Ken)
set rootK to point to Santino
Aileen (1)
Santino
Paul (3)
Joyce
Ken
Sam
Alex
Why not just use:
QuickUnion: Checking in on those runtimes
| Maps to Sets | QuickFind | QuickUnion |
makeSet(value) | 𝚯(1) | 𝚯(1) | 𝚯(1) |
findSet(value) | 𝚯(n) | 𝚯(1) | 𝚯(n) |
union(x, y) | 𝚯(n) | 𝚯(n) | 𝚯(1) |
union(A, B):
rootA = find(A)
rootB = find(B)
set rootA to point to rootB
kruskalMST(G graph)
DisjointSets<V> msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST);
*
*
QuickUnion: Let’s Build a Worst Case
union(A, B):
rootA = find(A)
rootB = find(B)
set rootA to point to rootB
find(A):
jump to A node
travel upward until root
return ID
Even with the ”use-the-roots” implementation of union, try to come up with a series of calls to union that would create a worst-case runtime for find on these Disjoint Sets:
A
B
C
D
QuickUnion: Let’s Build a Worst Case
union(A, B):
rootA = find(A)
rootB = find(B)
set rootA to point to rootB
find(A):
jump to A node
travel upward until root
return ID
Even with the ”use-the-roots” implementation of union, try to come up with a series of calls to union that would create a worst-case runtime for find on these Disjoint Sets:
A
B
C
D
union(A, B)
union(B, C)
union(C, D)
find(A)
B
A
C
D
Analyzing the QuickUnion Worst Case
B
A
C
D
union(C, D)
B
A
C
D
What currently happens
What would help avoid degenerate tree
WeightedQuickUnion
union(A, B):
rootA = find(A)
rootB = find(B)
put lighter root under heavier root
union(A, B)
union(B, C)
union(C, D)
find(A)
A
B
C
D
Now what happens?
B
A
C
D
Perfect! Best runtime we can get.
WeightedQuickUnion: Performance
union(A, B):
rootA = find(A)
rootB = find(B)
put lighter root under heavier root
WeightedQuickUnion: Performance
0
N | H |
1 | 0 |
WeightedQuickUnion: Performance
0
1
N | H |
1 | 0 |
2 | 1 |
WeightedQuickUnion: Performance
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | ? |
WeightedQuickUnion: Performance
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
WeightedQuickUnion: Performance
0
1
2
3
4
5
6
7
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | ? |
WeightedQuickUnion: Performance
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | 3 |
4
5
6
7
WeightedQuickUnion: Performance
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | 3 |
16 | 4 |
4
5
6
7
8
9
10
11
12
13
14
15
Why Weights Instead of Heights?
1
2
0
4
6
5
3
8
9
7
+
1
2
0
4
6
5
3
8
9
7
WeightedQuickUnion Runtime
| Maps to Sets | QuickFind | QuickUnion | WeightedQuickUnion |
makeSet(value) | 𝚯(1) | 𝚯(1) | 𝚯(1) | 𝚯(1) |
find(value) | 𝚯(n) | 𝚯(1) | 𝚯(n) | 𝚯(log n) |
union(x, y) assuming root args | 𝚯(n) | 𝚯(n) | 𝚯(1) | 𝚯(1) |
union(x, y) | 𝚯(n) | 𝚯(n) | 𝚯(n) | 𝚯(log n) |
Aileen
Joyce
Santino
Sam
Ken
1
2
2
2
1
B
A
C
D
1
2
3
4
5
6
Modifying Data Structures for Future Gains
Path Compression: Idea
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Path Compression: Idea
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Path Compression: Details and Runtime
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Path Compression: Runtime
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Path Compression: Runtime
N | log* N |
1 | 0 |
2 | 1 |
4 | 2 |
16 | 3 |
65536 | 4 |
265536 | 5 |
216
Number of atoms in the known universe is 2256ish
Appendix