Preannouncements
Leverage content from this class + 170 + 176 for life saving purposes.
Announcements
Project 2 due tonight at 11:59 PM.
CS61B
Lecture 20: Disjoint Sets
Meta-goals of the Coming Lectures: Data Structure Refinement
Project 2: A chance to see how a design evolves.
Next couple of weeks: Deriving classic solutions to interesting problems, with an emphasis on how set, map, and priority queue ADTs are implemented.
Today’s Goal: Dynamic Connectivity
Today: A case study in ADT implementation.
Goal: Given a series of pairwise connectedness declarations, determine if two items are connected transitively. Only care about yes vs. no.
Solvable using a simple ADT with cool implementation details.
The Dynamic Connectivity Problem
Goal: Given a series of pairwise integers connectedness declarations, determine if two integers (or items) are connected. Two operations:
0
4
3
1
2
5
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5)
isConnected(2, 4): true
isConnected(3, 0): false
6
The Dynamic Connectivity Problem
Goal: Given a series of pairwise integers connectedness declarations, determine if two integers (or items) are connected. Two operations:
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5)
isConnected(2, 4): true
isConnected(3, 0): false
connect(4, 2)
0
4
3
1
2
5
6
The Dynamic Connectivity Problem
Goal: Given a series of pairwise integers connectedness declarations, determine if two integers (or items) are connected. Two operations:
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5)
isConnected(2, 4): true
isConnected(3, 0): false
connect(4, 2)
connect(4, 6)
0
4
3
1
2
5
6
The Dynamic Connectivity Problem
Goal: Given a series of pairwise integers connectedness declarations, determine if two integers (or items) are connected. Two operations:
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5)
isConnected(2, 4): true
isConnected(3, 0): false
connect(4, 2)
connect(4, 6)
connect(3, 6)
0
4
3
1
2
5
6
The Dynamic Connectivity Problem
Goal: Given a series of pairwise integers connectedness declarations, determine if two integers (or items) are connected. Two operations:
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5)
isConnected(2, 4): true
isConnected(3, 0): false
connect(4, 2)
connect(4, 6)
connect(3, 6)
isConnected(3, 0): true
0
4
3
1
2
5
6
The Disjoint Sets ADT
Goal: Design an efficient DisjointSets implementation.
public interface DisjointSets {
/** Connects two items P and Q. */
void connect(int p, int q);
/** Checks to see if two items are connected. */
boolean isConnected(int p, int q);
}
getBack()
get(int i)
deleteBack()
connect(int p, int q)
isConnected(int p, int q)
The Naive Approach
Naive approach:
0
4
1
2
3
5
6
A Better Approach: Connected Components
Rather than manually writing out every single connecting line, record the sets that something belongs to.�
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5)
isConnected(2, 4): true
isConnected(3, 0): false
connect(4, 2)
connect(4, 6)
connect(3, 6)
isConnected(3, 0): true
{0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}
{0, 1}, {2}, {3}, {4}, {5}, {6}, {7}
{0, 1, 2}, {3}, {4}, {5}, {6}, {7}
{0, 1, 2, 4}, {3}, {5}, {6}, {7}
{0, 1, 2, 4}, {3, 5}, {6}, {7}
{0, 1, 2, 4}, {3, 5}, {6}, {7}
{0, 1, 2, 4, 6}, {3, 5}, {7}
{0, 1, 2, 3, 4, 5, 6}, {7}
A Better Approach: Connected Components
A connected component is a maximal set of items that are mutually connected.
{ 0, 1, 2, 4 }, {3, 5}, {6}
0
4
1
2
3
5
6
Quick Find
Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 5) operation: After connect(2, 5) operation:
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 5) operation: After connect(2, 5) operation:
Idea #1: ArrayList<ArrayList<Integer>> [linked list variant]
Idea #2: HashMap<Integer, Set>
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Challenge: Pick Data Structures to Support Tracking of Sets
Before connect(2, 5) operation: After connect(2, 5) operation:
Idea #3: Linked List
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Using an Array
One natural choice: int[] where ith entry gives set number of item i.
Before connect(2, 5) operation: After connect(2, 5) operation:
0 | 0 | 0 | 3 | 0 | 3 | 6 |
0 1 2 3 4 5 6
3 | 3 | 3 | 3 | 3 | 3 | 6 |
0 1 2 3 4 5 6
int[] id
int[] id
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
QuickFindDS
public class QuickFindDS implements DisjointSets {
private int[] id;
public boolean isConnected(int p, int q) {
return id[p] == id[q];
}
public void connect(int p, int q) {
int pid = id[p];
int qid = id[q];
for (int i = 0; i < id.length; i++) {
if (id[i] == pid) {
id[i] = qid;
}
}...
Very fast: Two array accesses.
Relatively slow: N+2 to 2N+2 array accesses.
public QuickFindDS(int N) {
id = new int[N];
for (int i = 0; i < N; i++)
id[i] = i;
}
}
Performance Summary
QuickFindDS is too slow: Connecting two items takes N time.
Implementation | constructor | connect | isConnected |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
Quick Union
Improving the Connect Operation
Approach zero: Represent everything as boxes and lines. This was overkill.
Approach one: Represent everything as connected components. Represented connected components as an array.
Approach two: We’re still going to stick with connected components, but will represent connected components differently.
0
4
1
2
3
5
6
??? in Java.
{ 0, 1, 2, 4 }, {3, 5}, {6}
0 | 0 | 0 | 3 | 0 | 3 | 6 |
0 1 2 3 4 5 6
int[] id
{ 0, 1, 2, 4 }, {3, 5}, {6}
??? in Java.
Improving the Connect Operation
A hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
0 | 0 | 0 | 3 | 0 | 3 | 6 |
0 1 2 3 4 5 6
3 | 3 | 3 | 3 | 3 | 3 | 6 |
0 1 2 3 4 5 6
int[] id
int[] id
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
Improving the Connect Operation
A hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
0 | 0 | 0 | 3 | 0 | 3 | 6 |
0 1 2 3 4 5 6
3 | 3 | 3 | 3 | 3 | 3 | 6 |
0 1 2 3 4 5 6
int[] id
int[] id
{ 0, 1, 2, 4 }, {3, 5}, {6}
{ 0, 1, 2, 4, 3, 5}, {6}
0
4
1
2
3
5
6
0
4
1
2
3
5
6
SET: 0
SET: 0
Improving the Connect Operation
Possibly harder question:
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “boss” of this set.
0 | 0 | 1 | 3 | 0 | 3 | 6 |
0 1 2 3 4 5 6
int[] parent
5
3
Improving the Connect Operation
connect(5, 2)
{0, 1, 2, 4} {3, 5} {6}
2
1
4
0
6
0 is the “boss” of this set.
0 | 0 | 1 | 3 | 0 | 3 | 6 |
0 1 2 3 4 5 6
int[] parent
5
3
Improving the Connect Operation
connect(5, 2)
{0, 1, 2, 4, 3, 5} {6}
2
1
4
0
5
3
6
0 is the “boss” of this set.
0 | 0 | 1 | 0 | 0 | 3 | 6 |
0 1 2 3 4 5 6
int[] parent
Set Union Using Rooted-Tree Representation
connect(5, 2)
What are the potential performance issues with this approach?
{0, 1, 2, 4, 3, 5} {6}
2
1
4
0
5
3
6
0 is the “boss” of this set.
0 | 0 | 1 | 0 | 0 | 3 | 6 |
0 1 2 3 4 5 6
int[] parent
Set Union Using Rooted-Tree Representation
connect(5, 2)
What are the potential performance issues with this approach?
{0, 1, 2, 4, 3, 5} {6}
2
1
4
0
5
3
6
0 is the “boss” of this set.
0 | 0 | 1 | 0 | 0 | 3 | 6 |
0 1 2 3 4 5 6
int[] parent
The Worst Case
If we always connect the first items’ tree below the second item’s tree, we can end up with a tree of height M after M operations:
For N items, what’s the worst case runtime…
3
2
1
0
4
QuickUnionDS
public class QuickUnionDS implements DisjointSets {
private int[] parent;
public QuickUnionDS(int N) {
parent = new int[N];
for (int i = 0; i < N; i++)
parent[i] = i;
}
private int find(int p) {
while (p != parent[p])
p = parent[p];
return p;
}
For N items, this means a worst case runtime of Θ(N).
public boolean isConnected(int p, int q) {
return find(p) == find(q);
}
public void connect(int p, int q) {
int i = find(p);
int j = find(q);
parent[i] = j;
}
Performance Summary
QuickFindDS defect: QuickFindDS is too slow: Connecting two items takes N time in the worst case.
QuickUnion defect: Trees can get tall. Results in potentially even worse performance than QuickFind if tree is imbalanced.
Implementation | Constructor | connect | isConnected |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
QuickUnionDS | Θ(N) | O(N) | O(N) |
Weighted Quick Union
Weighted QuickUnion: http://shoutkey.com/black
Modify quick-union to avoid tall trees.
New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?
1
2
0
4
6
5
3
8
9
7
0 | 0 | 0 | 0 | 0 | 0 | 6 | 6 | 6 | 8 |
0 1 2 3 4 5 6 7 8 9
int[] parent
Improvement #1: Weighted QuickUnion
Modify quick-union to avoid tall trees.
New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?
1
2
0
4
6
5
3
8
9
7
0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 6 | 8 |
0 1 2 3 4 5 6 7 8 9
int[] parent
Implementing WeightedQuickUnion
Minimal changes needed:
Now the connect method looks like:
public void connect(int p, int q) {
int i = find(p);
int j = find(q);
if (i == j) return;
if (size[i] < size[j]) { parent[i] = j; size[j] += size[i]; }
else { parent[j] = i; size[i] += size[j]; }
}
0 | 0 | 0 | 0 | 0 | 0 | 0 | 6 | 6 | 8 |
0 1 2 3 4 5 6 7 8 9
int[] parent
10 | 1 | 1 | 1 | 1 | 1 | 4 | 1 | 2 | 1 |
0 1 2 3 4 5 6 7 8 9
int[] size
WeightedQuickUnion Performance
As before, connect and isConnected require time proportional to depth of the items involved.
Max depth of any item: log N
Very brief proof for the curious (not covered in lecture):
Performance Summary
By tweaking QuickUnionDS we’ve achieved logarithmic time performance.
Implementation | Constructor | connect | isConnected |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
QuickUnionDS | Θ(N) | O(N) | O(N) |
WeightedQuickUnionDS | Θ(N) | O(log N) | O(log N) |
Performance Summary
Performing M operations on a DisjointSet object with N elements:
Implementation | Constructor | connect | isConnected |
QuickFindDS | Θ(N) | Θ(N) | Θ(1) |
QuickUnionDS | Θ(N) | O(N) | O(N) |
WeightedQuickUnionDS | Θ(N) | O(log N) | O(log N) |
Path Compression
(CS170 Spoiler)
170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
170 Spoiler: Path Compression: A Clever Idea
Below is the topology of the worst case if we use WeightedQuickUnion
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
170 Spoiler: Path Compression: A Clever Idea
Path compression results in a union/connected operations that are very very close to amortized constant time.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
N | lg* N |
1 | 0 |
2 | 1 |
4 | 2 |
16 | 3 |
65536 | 4 |
265536 | 5 |
170 Spoiler: Path Compression: A Clever Idea
Path compression results in a union/connected operations that are very very close to amortized constant time.
15
11
5
12
13
6
1
7
14
8
2
9
10
3
0
4
N | α(N) |
1 | 0 |
... | 1 |
... | 2 |
... | 3 |
... | 4 |
| 5 |
170 Spoiler: Path Compression: A Clever Idea
private int find(int p) {
if (p == parent[p]) {
return p;
} else {
parent[p] = find(parent[p]);
return parent[p];
}
}
Everything All Together...
public class WeightedQuickUnionDSWithPathCompression implements DisjointSets {
private int[] parent; private int[] size;
public WeightedQuickUnionDSWithPathCompression(int N) {
parent = new int[N]; size = new int[N];
for (int i = 0; i < N; i++) {
parent[i] = i;
size[i] = 1;
}
}
private int find(int p) {
if (p == parent[p]) {
return p;
} else {
parent[p] = find(parent[p]);
return parent[p];
}
}
public boolean isConnected(int p, int q) {
return find(p) == find(q);
}
public void connect(int p, int q) {
int i = find(p);
int j = find(q);
if (i == j) return;
if (size[i] < size[j]) {
parent[i] = j; size[j] += size[i];
} else {
parent[j] = i; size[i] += size[j];
}
}
Performance Summary
Runtimes are given assuming:
Implementation | Runtime |
QuickFindDS | Θ(NM) |
QuickUnionDS | O(NM) |
WeightedQuickUnionDS | O(N + M log N) |
WeightedQuickUnionDSWithPathCompression | O(N + M α(N)) |
Citations
Nazca Lines: http://redicecreations.com/ul_img/24592nazca_bird.jpg
Implementation code adapted from Algorithms, 4th edition and Professor Jonathan Shewchuk’s lecture notes on disjoint sets, where he presents a faster one-array solution. I would recommend taking a look.
(http://www.cs.berkeley.edu/~jrs/61b/lec/33)
The proof of the inverse ackermann runtime for disjoint sets is given here:
http://www.uni-trier.de/fileadmin/fb4/prof/INF/DEA/Uebungen_LVA-Ankuendigungen/ws07/KAuD/effi.pdf
as originally proved by Tarjan here at UC Berkeley in 1975.
extra
Set Union Using Parent Representation
connect(11, 3)
0 | 8 | 2 | 0 | 0 | 0 | 0 | 8 | 8 | 6 | 0 | 8 | 8 | 8 |
0 1 2 3 4 5 6 7 8 9 10 11 12 13
int[] parent
{0, 3, 4, 5, 6, 9, 10}, {1, 7, 8, 11, 12, 13}, {2}
10
9
6
5
4
3
0
13
11
12
7
1
8
2
Improving the Connect Operation
Possibly harder question:
{0, 3, 4, 5, 6, 9, 10}, {1, 7, 8, 11, 12, 13}, {2}
10
9
6
5
4
3
0
13
11
12
7
1
8
2
0 is the “boss” of this set.
0 | 8 | 2 | 0 | 0 | 0 | 0 | 8 | 8 | 6 | 0 | 8 | 8 | 8 |
0 1 2 3 4 5 6 7 8 9 10 11 12 13
int[] parent
Set Union Using Rooted-Tree Representation
connect(11, 3)
Anybody see any issues with this?
{0, 3, 4, 5, 6, 9, 10, 1, 7, 8, 11, 12, 13}, {2}
10
9
6
5
4
3
0
13
11
12
7
1
8
2
0 | 8 | 2 | 0 | 0 | 0 | 0 | 8 | 0 | 6 | 0 | 8 | 8 | 8 |
0 1 2 3 4 5 6 7 8 9 10 11 12 13
int[] id
Set Union Using Rooted-Tree Representation
connect(11, 3)
Anybody see any issues with this? Tree can get too tall!
0 | 8 | 2 | 0 | 0 | 0 | 0 | 8 | 0 | 6 | 0 | 8 | 8 | 8 |
0 1 2 3 4 5 6 7 8 9 10 11 12 13
int[] id
{0, 3, 4, 5, 6, 9, 10, 1, 7, 8, 11, 12, 13}, {2}
10
9
6
5
4
3
0
13
11
12
7
1
8
2