Disjoint Sets
1
Lecture 14
CS61B, Spring 2023 @ UC Berkeley
Josh Hug and Justin Yokota
Disjoint Sets API
Lecture 14, CS61B, Spring 2023
Introduction to Disjoint Sets
Implementations
Meta-goals of the Coming Lectures: Data Structure Refinement
Next couple of weeks: Deriving classic solutions to interesting problems, with an emphasis on how sets, maps, and priority queues are implemented.
Today: Deriving the “Disjoint Sets” data structure for solving the “Dynamic Connectivity” problem. We will see:
The Disjoint Sets Data Structure
The Disjoint Sets data structure has two operations:
Example:
Hyperloop
The Disjoint Sets Data Structure
The Disjoint Sets data structure has two operations:
Useful for many purposes, e.g.:
Hyperloop
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
6
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
6
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
ds.connect(4, 2)
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
6
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
ds.connect(4, 2)
ds.connect(4, 6)
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
6
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
ds.connect(4, 2)
ds.connect(4, 6)
ds.connect(3, 6)
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
6
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
ds.connect(4, 2)
ds.connect(4, 6)
ds.connect(3, 6)
ds.isConnected(3, 0): true
The Disjoint Sets Interface
Goal: Design an efficient DisjointSets implementation.
public interface DisjointSets {
/** Connects two items P and Q. */
void connect(int p, int q);
/** Checks to see if two items are connected. */
boolean isConnected(int p, int q);
}
getBack()
get(int i)
deleteBack()
connect(int p, int q)
isConnected(int p, int q)
Tracking Connected Components
Lecture 14, CS61B, Spring 2023
Introduction to Disjoint Sets
Implementations
The Naive Approach
Naive approach:
0
4
1
2
3
5
6
A Better Approach: Connected Components
Rather than manually writing out every single connecting line, only record the sets that each item belongs to.�
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5)
isConnected(2, 4): true
isConnected(3, 0): false
connect(4, 2)
connect(4, 6)
connect(3, 6)
isConnected(3, 0): true
{0}, {1}, {2}, {3}, {4}, {5}, {6}
{0, 1}, {2}, {3}, {4}, {5}, {6}
{0, 1, 2}, {3}, {4}, {5}, {6}
{0, 1, 2, 4}, {3}, {5}, {6}
{0, 1, 2, 4}, {3, 5}, {6}
{0, 1, 2, 4}, {3, 5}, {6}
{0, 1, 2, 4, 6}, {3, 5}
{0, 1, 2, 3, 4, 5, 6}
A Better Approach: Connected Components
For each item, its connected component is the set of all items that are connected to that item.
{ 0, 1, 2, 4 }, {3, 5}, {6}
0
4
1
2
3
5
6
Up next: We’ll consider how to do track set membership in Java.
Challenge: Pick Data Structures to Support Tracking of Sets
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Assume elements are numbered from 0 to N-1.
Before connect(2, 3) operation: After connect(2, 3) operation:
List of Sets
Lecture 14, CS61B, Spring 2023
Introduction to Disjoint Sets
Implementations
Challenge: Pick Data Structures to Support Tracking of Sets (Your Idea)
Before connect(2, 3) operation: After connect(2, 3) operation:
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Assume elements are numbered from 0 to N-1.
Challenge: Pick Data Structures to Support Tracking of Sets (Your Idea)
Before connect(2, 3) operation: After connect(2, 3) operation:
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Idea #1: List of sets of integers, e.g. [{0, 1, 2, 4}, {3, 5}, {6}]
Challenge: Pick Data Structures to Support Tracking of Sets (Your Idea)
If nothing is connected:
0
4
1
2
3
5
6
Idea #1: List of sets of integers, e.g. [{0}, {1}, {2}, {3}, {4}, {5}, {6}]
QuickFind
Lecture 14, CS61B, Spring 2023
Introduction to Disjoint Sets
Implementations
Performance Summary
ListOfSetsDS is complicated and slow.
Implementation | constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
Worst case is Θ(N), but other cases may be better. We’ll say O(N) since O means “less than or equal”.
Constructor’s runtime has order of growth N no matter what, so Θ(N).
Next Approach: Array of Integers
4 | 4 | 4 | 5 | 4 | 5 | 6 |
0 1 2 3 4 5 6
5 | 5 | 5 | 5 | 5 | 5 | 6 |
0 1 2 3 4 5 6
Idea #2: list of integers where ith entry gives set number (a.k.a. “id”) of item i.
int[] id
Before connect(2, 3) operation: After connect(2, 3) operation:
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
int[] id
QuickFindDS
public class QuickFindDS implements DisjointSets {
private int[] id;
public boolean isConnected(int p, int q) {
return id[p] == id[q];
}
public void connect(int p, int q) {
int pid = id[p];
int qid = id[q];
for (int i = 0; i < id.length; i++) {
if (id[i] == pid) {
id[i] = qid;
}
}...
Very fast: Two array accesses: Θ(1)
Relatively slow: N+2 to 2N+2 array accesses: Θ(N)
public QuickFindDS(int N) {
id = new int[N];
for (int i = 0; i < N; i++)
id[i] = -1;
}
}
Performance Summary
QuickFindDS is too slow for practical use: Connecting two items takes N time.
Implementation | constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
Quick Union
Lecture 14, CS61B, Spring 2023
Introduction to Disjoint Sets
Implementations
Improving the Connect Operation
Approach zero: Represent everything as boxes and lines. Overly complicated.
ListOfSets: Represent everything as connected components. Represented connected components as list of sets of integers.
QuickFind: Represent everything as connected components. Represented connected components as a list of integers, where value = id.
0
4
1
2
3
5
6
??? in Java instance variables.
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4 }, {3, 5}, {6}
[{0, 1, 2, 4}, {3, 5}, {6}]
[2, 2, 2, 3, 2, 3, 6]
List<Set<Integer>>
int[]
Improving the Connect Operation
QuickFind: Represent everything as connected components. Represented connected components as a list of integers where value = id.
Next approach (QuickUnion): We will still represent everything as connected components, and we will still represent connected components as a list of integers. However, values will be chosen so that connect is fast.
{ 0, 1, 2, 4 }, {3, 5}, {6}
[2, 2, 2, 3, 2, 3, 6]
int[]
Improving the Connect Operation
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
0 | 0 | 0 | 3 | 0 | 3 | 6 |
0 1 2 3 4 5 6
3 | 3 | 3 | 3 | 3 | 3 | 6 |
0 1 2 3 4 5 6
id
id
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Improving the Connect Operation (Your Answer)
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
0 | 0 | 0 | 3 | 0 | 3 | 6 |
0 1 2 3 4 5 6
3 | 3 | 3 | 3 | 3 | 3 | 6 |
0 1 2 3 4 5 6
id
id
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Improving the Connect Operation (Your Answer)
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
| | | | | | |
0 1 2 3 4 5 6
| | | | | | |
0 1 2 3 4 5 6
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
id
id
ZERO
THREE
SIX
THREE
THREE
SIX
Improving the Connect Operation
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
-1 | 0 | 1 | -1 | 0 | 3 | -1 |
0 1 2 3 4 5 6
parent�
5
3
Note: The optional textbook has an item’s parent as itself instead of -1 for root items.
Improving the Connect Operation
connect(5, 2)
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
5
3
parent
-1 | 0 | 1 | -1 | 0 | 3 | -1 |
0 1 2 3 4 5 6
Improving the Connect Operation (Your Answer)
connect(5, 2)
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
5
3
parent
-1 | 0 | 1 | -1 | 0 | 3 | -1 |
0 1 2 3 4 5 6
Improving the Connect Operation
connect(5, 2)
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
5
3
parent
-1 | 0 | 1 | -1 | 0 | 3 | -1 |
0 1 2 3 4 5 6
Improving the Connect Operation
connect(5, 2)
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
5
3
parent
-1 | 0 | 1 | 0 | 0 | 3 | -1 |
0 1 2 3 4 5 6
Set Union Using Rooted-Tree Representation
connect(5, 2)
What are the potential performance issues with this approach?
{0, 1, 2, 4, 3, 5} {6}
2
1
4
0
5
3
6
0 is the “root” of this set.
parent
-1 | 0 | 1 | 0 | 0 | 3 | -1 |
0 1 2 3 4 5 6
The Worst Case
If we always connect the first item’s tree below the second item’s tree, we can end up with a tree of height M after M operations:
For N items, what’s the worst case runtime…
3
2
1
0
4
The Worst Case
If we always connect the first item’s tree below the second item’s tree, we can end up with a tree of height M after M operations:
For N items, what’s the worst case runtime…
3
2
1
0
4
QuickUnionDS
public class QuickUnionDS implements DisjointSets {
private int[] parent;
public QuickUnionDS(int N) {
parent = new int[N];
for (int i = 0; i < N; i++)
{ parent[i] = -1; }
}
private int find(int p) {
int r = p;
while (parent[r] >= 0)
{ r = parent[r]; }
return r;
}
For N items, this means a worst case runtime of Θ(N).
public boolean isConnected(int p, int q) {
return find(p) == find(q);
}
{
int i = find(p);
int j = find(q);
parent[i] = j;
}
Here the find operation is the same as the “root(x)” idea we had in earlier slides.
Performance Summary
QuickFindDS defect: QuickFindDS is too slow: Connecting takes Θ(N) time.
QuickUnion defect: Trees can get tall. Results in potentially even worse performance than QuickFind if tree is imbalanced.
Implementation | Constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
QuickUnionDS | Θ(N) | O(N) | O(N) |
Using O because runtime can be between constant and linear.
Weighted Quick Union
Lecture 14, CS61B, Spring 2023
Introduction to Disjoint Sets
Implementations
A Choice of Two Roots: yellkey.com/offer
Suppose we are trying to connect(2, 5). We have two choices:
Which is the better choice?
2
1
4
0
5
3
2
1
4
0
5
3
2
1
4
0
5
3
+
A Choice of Two Roots
Suppose we are trying to connect(2, 5). We have two choices:
Which is the better choice?
2
1
4
0
5
3
2
1
4
0
5
3
2
1
4
0
5
3
+
Height: 2
Height: 3
Possible Approach
One possible approach is to keep track of the height of every tree.
Unfortunately, tracking tree height is kind of annoying.
Interestingly, tracking the tree’s “size”, a.k.a. “weight” works just as well asymptotically.
Weighted QuickUnion: http://yellkey.com/class
Modify quick-union to avoid tall trees.
New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?
1
2
0
4
6
5
3
8
9
7
-1 | 0 | 0 | 0 | 0 | 0 | -1 | 6 | 6 | 8 |
0 1 2 3 4 5 6 7 8 9
parent
Improvement #1: Weighted QuickUnion
Modify quick-union to avoid tall trees.
New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?
-1 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 6 | 8 |
0 1 2 3 4 5 6 7 8 9
parent
1
2
0
4
6
5
3
8
9
7
Note: The rule I picked is based on weight, not height. We’ll talk about why soon.
Implementing WeightedQuickUnion
Minimal changes needed:
-2 | -1 | -1 | -1 | -1 | 0 | -4 | 6 | 6 | 8 |
0 1 2 3 4 5 6 7 8 9
parent
2 | 1 | 1 | 1 | 1 | 1 | 4 | 1 | 2 | 1 |
0 1 2 3 4 5 6 7 8 9
size
An older version of this slide had incorrect values in the two arrays.
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
N | H |
1 | 0 |
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
N | H |
1 | 0 |
2 | 1 |
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
4
5
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
4
5
6
7
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
4
5
6
7
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | 3 |
4
5
6
7
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | 3 |
16 | 4 |
4
5
6
7
8
9
10
11
12
13
14
15
Performance Summary
QuickUnion’s runtimes are O(H), and WeightedQuickUnionDS height is given by H = O(log N). Therefore connect and isConnected are both O(log N).
By tweaking QuickUnionDS we’ve achieved logarithmic time performance.
Implementation | Constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
QuickUnionDS | Θ(N) | O(N) | O(N) |
WeightedQuickUnionDS | Θ(N) | O(log N) | O(log N) |
Why Weights Instead of Heights?
We used the number of items in a tree to decide upon the root.
1
2
0
4
6
5
3
8
9
7
+
1
2
0
4
6
5
3
8
9
7
WQU with Path Compression
Lecture 14, CS61B, Spring 2023
Introduction to Disjoint Sets
Implementations
What We’ve Achieved
Performing M operations on a DisjointSet object with N elements:
Implementation | Constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
WeightedQuickUnionDS | Θ(N) | O(log N) | O(log N) |
O(N) cost for constructor.
170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion.
Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion.
Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion.
Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion.
Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Intuition
By compressing the tree with each union and isConnected call, we keep the tree nice and short.
Intuition
By compressing the tree with each union and isConnected call, we keep the tree nice and short.
Note: The tree we started with in the exercise you completed is impossible to generate if we’re using path compression!
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
CS 170 Analysis of Disjoint Sets
In CS170, you’ll show that each isConnected or connect operation takes on average lg* N time because the tree is kept so compressed.
15
11
5
12
1
2
9
10
3
0
4
N | lg* N |
1 | 0 |
2 | 1 |
4 | 2 |
16 | 3 |
65536 | 4 |
265536 | 5 |
216
Even More Careful Analysis
You can provide an even tighter bound, showing that each operation takes on average α(N) time.
N | α(N) |
1 | 0 |
... | 1 |
... | 2 |
... | 3 |
... | 4 |
| 5 |
15
11
5
12
1
2
9
10
3
0
4
A Summary of Our Iterative Design Process
And we’re done! The end result of our iterative design process is the standard way disjoint sets are implemented today - quick union and path compression.
The ideas that made our implementation efficient:
Performance Summary
Implementation | Runtime |
ListOfSetsDS | O(NM) |
QuickFindDS | Θ(NM) |
QuickUnionDS | O(NM) |
WeightedQuickUnionDS | O(M log N) |
WeightedQuickUnionDSWithPathCompression | O(M α(N)) |
Runtimes are given assuming:
Citations
Nazca Lines: http://redicecreations.com/ul_img/24592nazca_bird.jpg
The proof of the inverse ackermann runtime for disjoint sets is given here:
http://www.uni-trier.de/fileadmin/fb4/prof/INF/DEA/Uebungen_LVA-Ankuendigungen/ws07/KAuD/effi.pdf
as originally proved by Tarjan here at UC Berkeley in 1975.