CS61B: 2019
Lecture 14: Disjoint Sets
Meta-goals of the Coming Lectures: Data Structure Refinement
Next couple of weeks: Deriving classic solutions to interesting problems, with an emphasis on how sets, maps, and priority queues are implemented.
Today: Deriving the “Disjoint Sets” data structure for solving the “Dynamic Connectivity” problem. We will see:
The Disjoint Sets Data Structure
The Disjoint Sets data structure has two operations:
Example:
Hyperloop
The Disjoint Sets Data Structure
The Disjoint Sets data structure has two operations:
Useful for many purposes, e.g.:
Hyperloop
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
6
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
6
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
ds.connect(4, 2)
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
6
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
ds.connect(4, 2)
ds.connect(4, 6)
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
6
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
ds.connect(4, 2)
ds.connect(4, 6)
ds.connect(3, 6)
Disjoint Sets on Integers
To keep things simple, we’re going to:
0
4
3
1
2
5
6
ds = DisjointSets(7)
ds.connect(0, 1)
ds.connect(1, 2)
ds.connect(0, 4)
ds.connect(3, 5)
ds.isConnected(2, 4): true
ds.isConnected(3, 0): false
ds.connect(4, 2)
ds.connect(4, 6)
ds.connect(3, 6)
ds.isConnected(3, 0): true
The Disjoint Sets Interface
Goal: Design an efficient DisjointSets implementation.
public interface DisjointSets {
/** Connects two items P and Q. */
void connect(int p, int q);
/** Checks to see if two items are connected. */
boolean isConnected(int p, int q);
}
getBack()
get(int i)
deleteBack()
connect(int p, int q)
isConnected(int p, int q)
The Naive Approach
Naive approach:
0
4
1
2
3
5
6
A Better Approach: Connected Components
Rather than manually writing out every single connecting line, only record the sets that each item belongs to.�
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5)
isConnected(2, 4): true
isConnected(3, 0): false
connect(4, 2)
connect(4, 6)
connect(3, 6)
isConnected(3, 0): true
{0}, {1}, {2}, {3}, {4}, {5}, {6}
{0, 1}, {2}, {3}, {4}, {5}, {6}
{0, 1, 2}, {3}, {4}, {5}, {6}
{0, 1, 2, 4}, {3}, {5}, {6}
{0, 1, 2, 4}, {3, 5}, {6}
{0, 1, 2, 4}, {3, 5}, {6}
{0, 1, 2, 4, 6}, {3, 5}
{0, 1, 2, 3, 4, 5, 6}
A Better Approach: Connected Components
For each item, its connected component is the set of all items that are connected to that item.
{ 0, 1, 2, 4 }, {3, 5}, {6}
0
4
1
2
3
5
6
Up next: We’ll consider how to do track set membership in Java.
Quick Find
Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 3) operation: After connect(2, 3) operation:
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Assume elements are numbered from 0 to N-1.
Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 3) operation: After connect(2, 3) operation:
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Assume elements are numbered from 0 to N-1.
Map<Integer, Integer> -- first number represents set and second represents item
Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 3) operation: After connect(2, 3) operation:
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Assume elements are numbered from 0 to N-1.
Map<Integer, Integer> -- first number represents the item, and the second is the set number.
Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 3) operation: After connect(2, 3) operation:
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Idea #1: List of sets of integers, e.g. [{0, 1, 2, 4}, {3, 5}, {6}]
Challenge: Pick Data Structures to Support Tracking of Sets
If nothing is connected:
0
4
1
2
3
5
6
Idea #1: List of sets of integers, e.g. [{0}, {1}, {2}, {3}, {4}, {5}, {6}]
Performance Summary
ListOfSetsDS is complicated and slow.
Implementation | constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
Worst case is Θ(N), but other cases may be better. We’ll say O(N) since O means “less than or equal”.
Constructor’s runtime has order of growth N no matter what, so Θ(N).
My Approach: Just Use a Array of Integers
4 | 4 | 4 | 5 | 4 | 5 | 6 |
0 1 2 3 4 5 6
5 | 5 | 5 | 5 | 5 | 5 | 6 |
0 1 2 3 4 5 6
Idea #2: list of integers where ith entry gives set number (a.k.a. “id”) of item i.
int[] id
Before connect(2, 3) operation: After connect(2, 3) operation:
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
int[] id
QuickFindDS
public class QuickFindDS implements DisjointSets {
private int[] id;
public boolean isConnected(int p, int q) {
return id[p] == id[q];
}
public void connect(int p, int q) {
int pid = id[p];
int qid = id[q];
for (int i = 0; i < id.length; i++) {
if (id[i] == pid) {
id[i] = qid;
}
}...
Very fast: Two array accesses: Θ(1)
Relatively slow: N+2 to 2N+2 array accesses: Θ(N)
public QuickFindDS(int N) {
id = new int[N];
for (int i = 0; i < N; i++)
id[i] = i;
}
}
Performance Summary
QuickFindDS is too slow for practical use: Connecting two items takes N time.
Implementation | constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
Quick Union
Improving the Connect Operation
Approach zero: Represent everything as boxes and lines. Overly complicated.
ListOfSets: Represent everything as connected components. Represented connected components as list of sets of integers.
QuickFind: Represent everything as connected components. Represented connected components as a list of integers, where value = id.
0
4
1
2
3
5
6
??? in Java instance variables.
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4 }, {3, 5}, {6}
[{0, 1, 2, 4}, {3, 5}, {6}]
[2, 2, 2, 3, 2, 3, 6]
List<Set<Integer>>
int[]
Improving the Connect Operation
QuickFind: Represent everything as connected components. Represented connected components as a list of integers where value = id.
Next approach (QuickUnion): We will still represent everything as connected components, and we will still represent connected components as a list of integers. However, values will be chosen so that connect is fast.
{ 0, 1, 2, 4 }, {3, 5}, {6}
[2, 2, 2, 3, 2, 3, 6]
int[]
Improving the Connect Operation
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
0 | 0 | 0 | 3 | 0 | 3 | 6 |
0 1 2 3 4 5 6
3 | 3 | 3 | 3 | 3 | 3 | 6 |
0 1 2 3 4 5 6
id
id
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Improving the Connect Operation (Your Answer)
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
| | | | | | |
0 1 2 3 4 5 6
| | | | | | |
0 1 2 3 4 5 6
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
id
id
ZERO
THREE
SIX
THREE
THREE
SIX
Improving the Connect Operation
Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
-1 | 0 | 1 | -1 | 0 | 3 | -1 |
0 1 2 3 4 5 6
parent�
5
3
Note: The optional textbook has an item’s parent as itself instead of -1 for root items.
Improving the Connect Operation
connect(5, 2)
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
5
3
parent
-1 | 0 | 1 | -1 | 0 | 3 | -1 |
0 1 2 3 4 5 6
Improving the Connect Operation (Your Answer)
connect(5, 2)
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
5
3
parent
-1 | 0 | 1 | 2 | 0 | 2 | -1 |
0 1 2 3 4 5 6
Improving the Connect Operation
connect(5, 2)
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
5
3
parent
-1 | 0 | 1 | -1 | 0 | 3 | -1 |
0 1 2 3 4 5 6
Improving the Connect Operation
connect(5, 2)
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “root” of this set.
5
3
parent
-1 | 0 | 1 | 0 | 0 | 3 | -1 |
0 1 2 3 4 5 6
Set Union Using Rooted-Tree Representation
connect(5, 2)
What are the potential performance issues with this approach?
{0, 1, 2, 4, 3, 5} {6}
2
1
4
0
5
3
6
0 is the “root” of this set.
parent
-1 | 0 | 1 | 0 | 0 | 3 | -1 |
0 1 2 3 4 5 6
Set Union Using Rooted-Tree Representation
connect(5, 2)
What are the potential performance issues with this approach?
{0, 1, 2, 4, 3, 5} {6}
2
1
4
0
5
3
6
0 is the “root” of this set.
parent
-1 | 0 | 1 | 0 | 0 | 3 | -1 |
0 1 2 3 4 5 6
The Worst Case
If we always connect the first item’s tree below the second item’s tree, we can end up with a tree of height M after M operations:
For N items, what’s the worst case runtime…
3
2
1
0
4
The Worst Case
If we always connect the first item’s tree below the second item’s tree, we can end up with a tree of height M after M operations:
For N items, what’s the worst case runtime…
3
2
1
0
4
QuickUnionDS
public class QuickUnionDS implements DisjointSets {
private int[] parent;
public QuickUnionDS(int N) {
parent = new int[N];
for (int i = 0; i < N; i++)
{ parent[i] = -1; }
}
private int find(int p) {
int r = p;
while (parent[r] >= 0)
{ r = parent[r]; }
return r;
}
For N items, this means a worst case runtime of Θ(N).
public boolean isConnected(int p, int q) {
return find(p) == find(q);
}
{
int i = find(p);
int j = find(q);
parent[i] = j;
}
Here the find operation is the same as the “root(x)” idea we had in earlier slides.
Performance Summary
QuickFindDS defect: QuickFindDS is too slow: Connecting takes Θ(N) time.
QuickUnion defect: Trees can get tall. Results in potentially even worse performance than QuickFind if tree is imbalanced.
Implementation | Constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
QuickUnionDS | Θ(N) | O(N) | O(N) |
Using O because runtime can be between constant and linear.
Weighted Quick Union
A Choice of Two Roots: http://yellkey.com/reveal
Suppose we are trying to connect(2, 5). We have two choices:
Which is the better choice?
2
1
4
0
5
3
2
1
4
0
5
3
2
1
4
0
5
3
+
A Choice of Two Roots
Suppose we are trying to connect(2, 5). We have two choices:
Which is the better choice?
2
1
4
0
5
3
2
1
4
0
5
3
2
1
4
0
5
3
+
Height: 2
Height: 3
A Choice of Two Roots
Suppose we are trying to connect(2, 5). We have two choices:
Which is the better choice?
2
1
5
4
0
3
2
1
4
0
5
3
2
1
4
0
5
3
+
Height: 2
Height: 3
Weighted QuickUnion: http://yellkey.com/society
Modify quick-union to avoid tall trees.
New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?
1
2
0
4
6
5
3
8
9
7
-1 | 0 | 0 | 0 | 0 | 0 | -1 | 6 | 6 | 8 |
0 1 2 3 4 5 6 7 8 9
parent
Improvement #1: Weighted QuickUnion
Modify quick-union to avoid tall trees.
New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?
-1 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 6 | 8 |
0 1 2 3 4 5 6 7 8 9
parent
1
2
0
4
6
5
3
8
9
7
Note: The rule I picked is based on weight, not height. We’ll talk about why soon.
Implementing WeightedQuickUnion
Minimal changes needed:
-6 | 0 | 0 | 0 | 0 | 0 | -4 | 6 | 6 | 8 |
0 1 2 3 4 5 6 7 8 9
parent
10 | 1 | 1 | 1 | 1 | 1 | 4 | 1 | 2 | 1 |
0 1 2 3 4 5 6 7 8 9
size
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
N | H |
1 | 0 |
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
N | H |
1 | 0 |
2 | 1 |
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
4
5
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
4
5
6
7
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
4
5
6
7
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | 3 |
4
5
6
7
Weighted Quick Union Performance
Let’s consider the worst case where the tree height grows as fast as possible.
0
1
2
3
N | H |
1 | 0 |
2 | 1 |
4 | 2 |
8 | 3 |
16 | 4 |
4
5
6
7
8
9
10
11
12
13
14
15
Performance Summary
QuickUnion’s runtimes are O(H), and WeightedQuickUnionDS height is given by H = O(log N). Therefore connect and isConnected are both O(log N).
By tweaking QuickUnionDS we’ve achieved logarithmic time performance.
Implementation | Constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
QuickUnionDS | Θ(N) | O(N) | O(N) |
WeightedQuickUnionDS | Θ(N) | O(log N) | O(log N) |
Why Weights Instead of Heights?
We used the number of items in a tree to decide upon the root.
1
2
0
4
6
5
3
8
9
7
+
1
2
0
4
6
5
3
8
9
7
Path Compression
(CS170 Spoiler)
What We’ve Achieved
Performing M operations on a DisjointSet object with N elements:
Implementation | Constructor | connect | isConnected |
ListOfSetsDS | Θ(N) | O(N) | O(N) |
WeightedQuickUnionDS | Θ(N) | O(log N) | O(log N) |
Suppose we have a ListOfSetsDS implementation of Disjoint Sets.
Suppose that it has 1000 items, i.e. N = 1000.
Suppose we perform a total of 150 connect operations and 212 isConnected operations.
So when we say O(NM), we’re saying it’ll take no more than 1000 * 362 units of time (in some arbitrary unit of time).
170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Theoretical Performance (Bonus)
Path compression results in a union/connected operations that are very very close to amortized constant time.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
N | α(N) |
1 | 0 |
... | 1 |
... | 2 |
... | 3 |
... | 4 |
| 5 |
170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Another Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
Path Compression: Another Clever Idea
Draw the tree after we call isConnected(14, 13).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
170 Spoiler: Path Compression: A Clever Idea
Path compression results in a union/connected operations that are very very close to amortized constant time (amortized constant means “constant on average”).
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
N | lg* N |
1 | 0 |
2 | 1 |
4 | 2 |
16 | 3 |
65536 | 4 |
265536 | 5 |
216
Path Compression: Theoretical Performance (Bonus)
Path compression results in a union/connected operations that are very very close to amortized constant time.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
N | α(N) |
1 | 0 |
... | 1 |
... | 2 |
... | 3 |
... | 4 |
| 5 |
A Summary of Our Iterative Design Process
And we’re done! The end result of our iterative design process is the standard way disjoint sets are implemented today - quick union and path compression.
The ideas that made our implementation efficient:
Performance Summary
Implementation | Runtime |
ListOfSetsDS | O(NM) |
QuickFindDS | Θ(NM) |
QuickUnionDS | O(NM) |
WeightedQuickUnionDS | O(N + M log N) |
WeightedQuickUnionDSWithPathCompression | O(N + M α(N)) |
Runtimes are given assuming:
Citations
Nazca Lines: http://redicecreations.com/ul_img/24592nazca_bird.jpg
The proof of the inverse ackermann runtime for disjoint sets is given here:
http://www.uni-trier.de/fileadmin/fb4/prof/INF/DEA/Uebungen_LVA-Ankuendigungen/ws07/KAuD/effi.pdf
as originally proved by Tarjan here at UC Berkeley in 1975.