Lecture 19: Disjoint Sets
CSE 373: Data Structures and Algorithms
1
CSE 373 23SU
CSE 373 23SU
1
Warmup
KruskalMST(Graph G)
initialize each vertex to be an independent component
sort the edges by weight
foreach(edge (u, v) in sorted order){
if(u and v are in different components){
add (u,v) to the MST
update u and v to be in the same component
}
}
Run Kruskal’s algorithm on the following graph to find the MST (minimum spanning tree) of the graph below.
Below is the provided pseudocode for Kruksal’s algorithm to choose all the edges.
CSE 373 23SU
2
Announcements
CSE 373 23SU
3
Disjoint Sets
CSE 373 23SU
4
Selecting an ADT
Kruskal’s needs to find what MST a vertex belongs to, and union those MSTs together
A
B
D
E
C
4
2
11
1
3
5
8
9
10
7
6
F
kruskalMST(G graph)
Set(?) msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST)
CSE 373 23SU
5
Disjoint Sets ADT (aka “Union-Find”)
Kruskal’s will use a Disjoint Sets ADT under the hood
DISJOINT SETS ADT
State
Family of Sets
Behavior
makeSet(value) – new set with value as only member (and representative)
find(value) – return representative of the set containing value
union(x, y) – combine sets containing x and y into one set with all elements, choose single new representative
kruskalMST(G graph)
DisjointSets<V> msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST);
A
B
D
E
C
4
2
11
1
3
5
8
9
10
7
6
F
CSE 373 23SU
6
Disjoint Sets in mathematics
“In mathematics, two sets are said to be disjoint sets if they have no element in common.” - Wikipedia
disjoint = not overlapping
Kevin
Aileen
Keanu
Sherdil
Leona
These two sets are disjoint sets
Nishu
Santino
Brian
These two sets are not disjoint sets
Santino
Set #1
Set #2
Set #3
Set #4
CSE 373 23SU
7
Disjoint Sets in computer science
In computer science, disjointsets can refer to this ADT/data structure that keeps track of the multiple “mini” sets that are disjoint
This overall grey blob thing is the actual
disjoint sets, and it’s keeping track of any number of mini-sets, which are all disjoint (the mini sets have no overlapping values).
Note: this might feel really different than ADTs we’ve run into before. The ADTs we’ve seen before (dictionaries, lists, sets, etc.) just store values directly.
But the Disjoint Set ADT is particularly interested in
letting you group your values into sets and
keep track of which particular set your values are in.
new ADT!
Kevin
Aileen
Keanu
Sherdil
Leona
Set #1
Set #2
CSE 373 23SU
8
New ADT
Set ADT
create(x) - creates a new set with a single member, x
Count of Elements
state
behavior
Set of elements
add(x) - adds x into set if it is unique, otherwise add is ignored
remove(x) – removes x from set
size() – returns current number of elements in set
Disjoint-Set ADT
makeSet(x) – creates a new set within the disjoint set where the only member is x. Picks representative for set
Count of Sets
state
behavior
Set of Sets
findSet(x) – looks up the set containing element x, returns representative of that set
union(x, y) – looks up set containing x and set containing y, combines two sets into one. Picks new representative for resulting set
D
B
C
A
D
C
F
B
A
G
H
CSE 373 23SU
9
findSet(value)
findSet(value) returns some ID for which particular set the value is in. For Disjoint Sets, we often call this the representative (as it’s a value that represents the whole set).
Examples:
findSet(Brian)
findSet(Sherdil)
findSet(Santino)
findSet(Kevin) == findSet(Aileen)
4
2
3
true
Kevin
Aileen
Keanu
Set #1
Sherdil
Leona
Set #2
Nishu
Santino
Set #3
Brian
Rahul
Set #4
CSE 373 23SU
10
union(valueA, valueB)
union(valueA, valueB) merges the set that A is in with the set that B is in. (basically add the two sets together into one)
Example: union(Kevin, Nishu)
Kevin
Aileen
Keanu
Set #1
Sherdil
Leona
Set #2
Nishu
Santino
Set #3
Brian
Rahul
Set #4
Kevin
Aileen
Keanu
Set #1
Sherdil
Leona
Set #2
Nishu
Santino
Brian
Rahul
Set #4
CSE 373 23SU
11
makeSet(value)
makeSet(value) makes a new mini set that just has the value parameter in it.
Examples:
makeSet(Elena)
makeSet(Anish)
Kevin
Aileen
Keanu
Set #1
Sherdil
Leona
Set #2
Nishu
Santino
Set #3
Brian
Rahul
Set #4
Set #5
Set #6
Elena
Anisha
CSE 373 23SU
12
Example
b
Rep: 1
a
Rep: 0
e
Rep: 4
c
Rep: 2
d
Rep: 3
new()
makeSet(a)
makeSet(b)
makeSet(c)
makeSet(d)
makeSet(e)
findSet(a)
findSet(d)
union(a, c)
CSE 373 23SU
13
Example
b
Rep: 1
e
Rep: 4
d
Rep: 3
c
a
Rep: 0
new()
makeSet(a)
makeSet(b)
makeSet(c)
makeSet(d)
makeSet(e)
findSet(a)
findSet(d)
union(a, c)
union(b, d)
CSE 373 23SU
14
Example
findSet(a) == findSet(c)
findSet(a) == findSet(d)
e
Rep: 4
c
a
Rep: 0
b
Rep: 1
d
new()
makeSet(a)
makeSet(b)
makeSet(c)
makeSet(d)
makeSet(e)
findSet(a)
findSet(d)
union(a, c)
union(b, d)
true
false
CSE 373 23SU
15
Questions?
CSE 373 23SU
16
Disjoint Set Implementation
Weighted Union
Path Compression
Array Implementation
CSE 373 23SU
17
Implementation
TreeDisjointSet<E>
makeSet(x)-create a new tree of size 1 and add to our forest
state
behavior
Set<TreeSet> forest
findSet(x)-locates node with x and moves up tree to find root
union(x, y)-append tree with y as a child of tree with x
Disjoint-Set ADT
makeSet(x) – creates a new set within the disjoint set where the only member is x. Picks representative for set
Count of Sets
state
behavior
Set of Sets
findSet(x) – looks up the set containing element x, returns representative of that set
union(x, y) – looks up set containing x and set containing y, combines two sets into one. Picks new representative for resulting set
Map<NodeValues, NodeLocations> nodeInventory
TreeSet<E>
TreeSet(x)
state
behavior
SetNode overallRoot
add(x)
remove(x, y)
getRep()- returns data of overallRoot
SetNode<E>
SetNode(x)
state
behavior
E data
updateParent(x)
SetNode<E> parent
CSE 373 23SU
18
Implementation
Nishu
Santino
Brian
Rahul
TreeDisjointSet<E>
TreeSet<E>
SetNode<E>
1
4
Kevin
Aileen
Keanu
Sherdil
Leona
9
Disjoint Sets are built as a collection of three objects
The TreeDisjointSet<E> is the top level object with two fields:
Each TreeSet<E> is a collection of SetNodes<E>
Each SetNode<E> can have an unlimited number of children, but only one parent. Nodes store parents instead of children.
Set<TreeSet> forest =
Map<E, SetNode<E>> nodeInventory =
CSE 373 23SU
19
Implement makeSet(x)
Worst case runtime?
O(1)
TreeDisjointSet<E>
makeSet(x)-create a new tree of size 1 and add to our forest
state
behavior
Collection<TreeSet> forest
findSet(x)-locates node with x and moves up tree to find root
union(x, y)-append tree with y as a child of tree with x
Dictionary<NodeValues, NodeLocations> nodeInventory
0
1
2
3
4
5
forest
0 | 1 | 2 | 3 | 4 | 5 |
| | | | | |
CSE 373 23SU
20
Implement find(X)
Key Idea: Jump to the node given. Travel upward to parent until parent field is null, nodes with null parents are roots and their data will act as the representative for the set
How do we jump to a node quickly?
Runtime
find(Santino) -> Aileen
find(Ken) -> Joyce
find(Santino) != find(Ken)
find(Santino) == find(Aileen)
find(Ken):
jump to Ken node
travel upward until root Joyce
return root “Joyce”
Aileen
Santino
Paul
Joyce
Ken
Sam
Alex
Sam
Alex
Paul
…
Map<E, SetNode<E>> nodeInventory =
CSE 373 23SU
21
Implement union(x, y)
Key idea: easy to simply rearrange pointers to union entire trees together!
union(Ken, Santino):
rootK = find(Ken) //Joyce
rootS = find(Santino) //Aileen
set rootK to point to rootS
Aileen
Santino
Paul
Joyce
Ken
Sam
Alex
RESULT:
Aileen
Santino
Paul
Joyce
Ken
Sam
Alex
CSE 373 23SU
22
Union: Why bother with the second root?
Key idea: keeping the height of each tree short will help minimize runtime for future find(x)
union(Ken, Santino):
rootK = find(Ken)
rootS = find(Santino)
set rootK to point to rootS
union(Ken, Santino):
rootK = find(Ken)
set rootK to point to Santino
Aileen
Santino
Paul
Joyce
Ken
Sam
Alex
Why not just use:
Aileen
Santino
Paul
Joyce
Ken
Sam
Alex
Height = 3
Height = 4
CSE 373 23SU
23
Union runtime
union(A, B):
rootA = find(A)
rootB = find(B)
set rootA to point to rootB
find(A):
jump to A node
travel upward until root
return ID
A series of calls to union that would create a worst-case runtime for find on these Disjoint Sets:
A
B
C
D
union(A, B)
B
A
C
D
union(B, C)
union(C, D)
find(A)
n runtime :(
CSE 373 23SU
24
Analyzing the union worst case
B
A
C
D
union(C, D)
B
A
C
D
What currently happens
What would help avoid degenerate tree
CSE 373 23SU
25
Practice
Given the following disjoint-set what would be the result of the following calls on union if we always add the smaller tree (fewer nodes) into the larger tree (more nodes). Draw the forest at each stage with corresponding ranks for each tree.
6
4
5
0
3
1
2
8
10
12
9
11
7
13
CSE 373 23SU
26
Practice
Given the following disjoint-set what would be the result of the following calls on union if we add the “union-by-weight” optimization. Draw the forest at each stage with corresponding ranks for each tree.
6
4
5
0
3
1
2
8
10
12
9
11
7
13
CSE 373 23SU
27
Practice
Given the following disjoint-set what would be the result of the following calls on union if we add the “union-by-weight” optimization. Draw the forest at each stage with corresponding ranks for each tree.
6
4
5
0
3
1
2
8
10
12
9
11
7
13
CSE 373 23SU
28
Practice
Given the following disjoint-set what would be the result of the following calls on union if we add the “union-by-weight” optimization. Draw the forest at each stage with corresponding ranks for each tree.
6
4
5
0
3
1
2
8
10
12
9
11
7
13
CSE 373 23SU
29
Practice
Given the following disjoint-set what would be the result of the following calls on union if we add the “union-by-weight” optimization. Draw the forest at each stage with corresponding ranks for each tree.
8
10
12
9
11
6
4
5
0
3
1
2
7
13
CSE 373 23SU
30
Disjoint Set Implementation
Weighted Union
Path Compression
Array Implementation
CSE 373 23SU
31
WeightedUnion
Goal: Always pick the smaller tree to go under the larger tree
Implementation: Store the number of nodes (or “weight”) of each tree in the root
union(A, B):
rootA = find(A)
rootB = find(B)
put lighter root under heavier root
A
union(A, B)
union(B, C)
union(C, D)
find(A)
weight: 1
B
weight: 1
C
weight: 1
D
weight: 1
2
3
4
O(1) runtime :)
CSE 373 23SU
32
WeightedUnion: Performance
Consider the worst case where the tree height grows as fast as possible
0
N | H |
1 | 0 |
CSE 373 23SU
33
WeightedUnion: Performance
Consider the worst case where the tree height grows as fast as possible
0
1
N | H |
1 | 0 |
2 | 1 |
CSE 373 23SU
34
WeightedUnion: Performance
Consider the worst case where the tree height grows as fast as possible
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | ? |
CSE 373 23SU
35
WeightedUnion: Performance
Consider the worst case where the tree height grows as fast as possible
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
CSE 373 23SU
36
WeightedUnion: Performance
Consider the worst case where the tree height grows as fast as possible
0
1
2
3
4
5
6
7
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | ? |
CSE 373 23SU
37
WeightedUnion: Performance
Consider the worst case where the tree height grows as fast as possible
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | 3 |
4
5
6
7
CSE 373 23SU
38
WeightedUnion: Performance
Consider the worst case where the tree height grows as fast as possible
Worst case tree height is O(log N)
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | 3 |
16 | 4 |
4
5
6
7
8
9
10
11
12
13
14
15
CSE 373 23SU
39
Runtime so far…
This is pretty good! But there’s one final optimization we can make:
path compression����
| Worst Case Runtime |
makeSet(value) | O(1) |
find(value) | O(log n) |
union(x, y) | O(log n) |
kruskalMST(G graph)
DisjointSets<V> msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST);
O(ElogV)
CSE 373 23SU
40
Disjoint Set Implementation
Weighted Union
Path Compression
Array Implementation
CSE 373 23SU
41
Modifying Data Structures for Future Gains
CSE 373 23SU
42
Path Compression: Idea
This is the worst-case topology if we use WeightedUnion�������
Key Idea: When we do find(15), move all visited nodes under the root
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CSE 373 23SU
43
Path Compression: Idea
This is the worst-case topology if we use WeightedUnion�������
Key Idea: When we do find(15), move all visited nodes under the root
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Perform Path Compression on every find(), so future calls to find() are faster!
CSE 373 23SU
44
Path Compression: Details and Runtime
Run path compression on every find()!
Understanding the performance of M>1 operations requires amortized analysis
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CSE 373 23SU
45
Path Compression: Runtime
M find()s on WeightedUnion requires takes O(M log N)����
… but M find()s using the WeightedUnion and PathCompression optimizations takes O(M log*N)!
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
CSE 373 23SU
46
Path Compression: Runtime
Path compression results in find()s and union()s that are very very close to (amortized) constant time
N | log* N |
1 | 0 |
2 | 1 |
4 | 2 |
16 | 3 |
65536 | 4 |
265536 | 5 |
216
Number of atoms in the known universe is 2256ish
CSE 373 23SU
47
Kruskal’s Runtime
Find and union are log|V| in worst case, but amortized constant “in practice”
Either way, dominated by time to sort the edges ☹
kruskalMST(G graph)
DisjointSets<V> msts; Set finalMST;
initialize msts with each vertex as single-element MST
sort all edges by weight (smallest to largest)
for each edge (u,v) in ascending order:
uMST = msts.find(u)
vMST = msts.find(v)
if (uMST != vMST):
finalMST.add(edge (u, v))
msts.union(uMST, vMST)
CSE 373 23SU
48
Disjoint Set Implementation
Weighted Union
Path Compression
Array Implementation
CSE 373 23SU
49
Using Arrays for Up-Trees
Since every node can have at most one parent, what if we use an array to store the parent relationships?
Proposal: each node corresponds to an index, where we store the index of the parent (or –1 for roots). Use the root index as the representative ID!
Just like with heaps, tree picture still conceptually correct, but exists in our minds!
Aileen
Santino
Paul
Joyce
Ken
Sam
Alex
0 | 1 | 2 | 3 | 4 | 5 | 6 |
-1 | 0 | -1 | 6 | -1 | 2 | 0 |
Joyce | Sam | Aileen | Alex | Paul | Santino | Ken |
CSE 373 23SU
50
Using Arrays: Find
Initial jump to element still done with extra Map
But traversing up the tree can be done purely within the array!
0 | 1 | 2 | 3 | 4 | 5 | 6 |
-1 | 0 | -1 | 6 | -1 | 2 | 0 |
Joyce | Sam | Aileen | Alex | Paul | Santino | Ken |
Aileen
Santino
Paul
Joyce
Ken
Sam
Alex
Alex
Aileen
Sam
…
find(A):
index = jump to A node’s index
while array[index] > 0:
index = array[index]
path compression
return index
1
2
find(Alex)
1
2
= 0
Can still do path compression by setting all indices along the way to the root index!
0
3
3
CSE 373 23SU
51
Using Arrays: Union
For WeightedUnion, we need to store the number of nodes in each tree (the weight)
Instead of just storing -1 to indicate a root, we can store -1 * weight!
0 | 1 | 2 | 3 | 4 | 5 | 6 |
-4 | 0 | -2 | 6 | -1 | 2 | 0 |
Joyce | Sam | Aileen | Alex | Paul | Santino | Ken |
Aileen
Santino
Joyce
Ken
Sam
Alex
union(A, B):
rootA = find(A)
rootB = find(B)
use -1 * array[rootA] and -1 *
array[rootB] to determine weights
put lighter root under heavier root
weight 4
weight 2
union(Ken, Santino)
Paul
weight 1
Aileen
Santino
Paul
Joyce
Ken
Sam
Alex
CSE 373 23SU
52
Using Arrays: Union
For WeightedUnion, we need to store the number of nodes in each tree (the weight)
Instead of just storing -1 to indicate a root, we can store -1 * weight!
0 | 1 | 2 | 3 | 4 | 5 | 6 |
-4 | 0 | -2 | 6 | -1 | 2 | 0 |
Joyce | Sam | Aileen | Alex | Paul | Santino | Ken |
union(A, B):
rootA = find(A)
rootB = find(B)
use -1 * array[rootA] and -1 *
array[rootB] to determine weights
put lighter root under heavier root
-6
0
Aileen (2)
Santino
Joyce
Ken
Sam
Alex
weight 6
Paul
weight 1
Aileen
union(Ken, Santino)
CSE 373 23SU
53
Array Implementation Practice
1
6
3
weight = 1
4
2
10
5
7
0
9
8
11
15
13
weight = 8
14
12
17
16
18
weight = 10
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
| | | | | | | | | | | | | | | | | | |
Fill in the array representing this DisJoint set. Remember to Store (weight * -1) - 1 as the “parent” of the root nodes
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 |
-1 | -10 | 1 | 2 | 2 | 2 | 1 | 6 | 7 | 7 | 6 | -8 | 11 | 12 | 12 | 11 | 15 | 15 | 17 |
Each “node” now only takes 4 bytes of memory instead of 32
CSE 373 23SU
54
Array Implementation Practice
55
3
0
weight = 1
4
11
1
5
2
13
12
weight = 8
10
9
14
15
8
weight = 6
weight = 2
6
7
16
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
3 | 0 | 0 | -6 | 3 | -1 | -2 | 6 | 12 | 13 | 13 | 0 | 13 | -8 | 12 | 12 | 12 |
Update the Array with the correct values after a call of union(14, 11) using WeightedUnion and PathCompression
CSE 373 23SU
55
Array Implementation Practice
56
3
0
weight = 1
4
11
1
5
2
13
12
weight = 14
10
9
14
15
8
weight = 2
6
7
16
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
3 | 0 | 0 | 13 | 3 | -1 | -2 | 6 | 12 | 13 | 13 | 13 | 13 | -14 | 13 | 12 | 12 |
Update the Array with the correct values after a call of union(14, 11) using WeightedUnion and PathCompression
CSE 373 23SU
56
That’s all!
CSE 373 23SU
57