1 of 73

CS61B: 2019

Lecture 14: Disjoint Sets

Dynamic Connectivity and the Disjoint Sets Problem
Quick Find
Quick Union
Weighted Quick Union
Path Compression (CS170 Preview)

2 of 73

Meta-goals of the Coming Lectures: Data Structure Refinement

Next couple of weeks: Deriving classic solutions to interesting problems, with an emphasis on how sets, maps, and priority queues are implemented.

Today: Deriving the “Disjoint Sets” data structure for solving the “Dynamic Connectivity” problem. We will see:

How a data structure design can evolve from basic to sophisticated.
How our choice of underlying abstraction can affect asymptotic runtime (using our formal Big-Theta notation) and code complexity.

datastructur.es

3 of 73

The Disjoint Sets Data Structure

The Disjoint Sets data structure has two operations:

connect(x, y): Connects x and y.
isConnected(x, y): Returns true if x and y are connected. Connections can be transitive, i.e. they don’t need to be direct.

Example:

connect(China, Vietnam)

isConnected(USA, Mongolia)? false

isConnected(Vietnam, Mongolia)? true

connect(USA, Canada)

connect(China, Mongolia)

connect(China, USA)

isConnected(USA, Mongolia)? true

Hyperloop

datastructur.es

4 of 73

The Disjoint Sets Data Structure

The Disjoint Sets data structure has two operations:

connect(x, y): Connects x and y.
isConnected(x, y): Returns true if x and y are connected. Connections can be transitive, i.e. they don’t need to be direct.

Useful for many purposes, e.g.:

Percolation theory:

Computational chemistry.

Implementation of other algorithms:

Kruskal’s algorithm.

Hyperloop

datastructur.es

5 of 73

Disjoint Sets on Integers

To keep things simple, we’re going to:

Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

6

datastructur.es

6 of 73

Disjoint Sets on Integers

To keep things simple, we’re going to:

Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

6

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

ds.connect(4, 2)

datastructur.es

7 of 73

Disjoint Sets on Integers

To keep things simple, we’re going to:

Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

6

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

ds.connect(4, 2)

ds.connect(4, 6)

datastructur.es

8 of 73

Disjoint Sets on Integers

To keep things simple, we’re going to:

Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

6

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

ds.connect(4, 2)

ds.connect(4, 6)

ds.connect(3, 6)

datastructur.es

9 of 73

Disjoint Sets on Integers

To keep things simple, we’re going to:

Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

6

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

ds.connect(4, 2)

ds.connect(4, 6)

ds.connect(3, 6)

ds.isConnected(3, 0): true

datastructur.es

10 of 73

The Disjoint Sets Interface

Goal: Design an efficient DisjointSets implementation.

Number of elements N can be huge.
Number of method calls M can be huge.
Calls to methods may be interspersed (e.g. can’t assume it’s onlu connect operations followed by only isConnected operations).

public interface DisjointSets {

/** Connects two items P and Q. */

void connect(int p, int q);

/** Checks to see if two items are connected. */

boolean isConnected(int p, int q);

}

getBack()

get(int i)

deleteBack()

connect(int p, int q)

isConnected(int p, int q)

datastructur.es

11 of 73

The Naive Approach

Naive approach:

Connecting two things: Record every single connecting line in some data structure.
Checking connectedness: Do some sort of (??) iteration over the lines to see if one thing can be reached from the other.

0

4

1

2

3

5

6

datastructur.es

12 of 73

A Better Approach: Connected Components

Rather than manually writing out every single connecting line, only record the sets that each item belongs to.�

connect(0, 1)

connect(1, 2)

connect(0, 4)

connect(3, 5)

isConnected(2, 4): true

isConnected(3, 0): false

connect(4, 2)

connect(4, 6)

connect(3, 6)

isConnected(3, 0): true

{0}, {1}, {2}, {3}, {4}, {5}, {6}

{0, 1}, {2}, {3}, {4}, {5}, {6}

{0, 1, 2}, {3}, {4}, {5}, {6}

{0, 1, 2, 4}, {3}, {5}, {6}

{0, 1, 2, 4}, {3, 5}, {6}

{0, 1, 2, 4, 6}, {3, 5}

{0, 1, 2, 3, 4, 5, 6}

datastructur.es

13 of 73

A Better Approach: Connected Components

For each item, its connected component is the set of all items that are connected to that item.

Naive approach: Record every single connecting line somehow.
Better approach: Model connectedness in terms of sets.

How things are connected isn’t something we need to know.
Only need to keep track of which connected component each item belongs to.

{ 0, 1, 2, 4 }, {3, 5}, {6}

0

4

1

2

3

5

6

Up next: We’ll consider how to do track set membership in Java.

datastructur.es

14 of 73

Quick Find

datastructur.es

15 of 73

Challenge: Pick Data Structures to Support Tracking of Sets

Before connect(2, 3) operation: After connect(2, 3) operation:

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

Assume elements are numbered from 0 to N-1.

datastructur.es

16 of 73

Challenge: Pick Data Structures to Support Tracking of Sets

Before connect(2, 3) operation: After connect(2, 3) operation:

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

Assume elements are numbered from 0 to N-1.

Map<Integer, Integer> -- first number represents set and second represents item

Slow because you have to iterate to find which set something belongs to.

datastructur.es

17 of 73

Challenge: Pick Data Structures to Support Tracking of Sets

Before connect(2, 3) operation: After connect(2, 3) operation:

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

Assume elements are numbered from 0 to N-1.

Map<Integer, Integer> -- first number represents the item, and the second is the set number.

More or less what we get to shortly, but less efficient for reasons I will explain.

datastructur.es

18 of 73

Challenge: Pick Data Structures to Support Tracking of Sets

Before connect(2, 3) operation: After connect(2, 3) operation:

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

Idea #1: List of sets of integers, e.g. [{0, 1, 2, 4}, {3, 5}, {6}]

In Java: List<Set<Integer>>.
Very intuitive idea.

datastructur.es

19 of 73

Challenge: Pick Data Structures to Support Tracking of Sets

If nothing is connected:

0

4

1

2

3

5

6

Idea #1: List of sets of integers, e.g. [{0}, {1}, {2}, {3}, {4}, {5}, {6}]

In Java: List<Set<Integer>>.
Very intuitive idea.
Requires iterating through all the sets to find anything. Complicated and slow!

Worst case: If nothing is connected, then isConnected(5, 6) requires iterating through N-1 sets to find 5, then N sets to find 6. Overall runtime of Θ(N).

datastructur.es

20 of 73

Performance Summary

ListOfSetsDS is complicated and slow.

Operations are linear when number of connections are small.

Have to iterate over all sets.

Important point: By deciding to use a List of Sets, we have doomed ourselves to complexity and bad performance.

Implementation	constructor	connect	isConnected
ListOfSetsDS	Θ(N)	O(N)	O(N)

Worst case is Θ(N), but other cases may be better. We’ll say O(N) since O means “less than or equal”.

Constructor’s runtime has order of growth N no matter what, so Θ(N).

datastructur.es

21 of 73

My Approach: Just Use a Array of Integers

4	4	4	5	4	5	6

0 1 2 3 4 5 6

5	5	5	5	5	5	6

0 1 2 3 4 5 6

Idea #2: list of integers where ith entry gives set number (a.k.a. “id”) of item i.

connect(p, q): Change entries that equal id[p] to id[q]

int[] id

Before connect(2, 3) operation: After connect(2, 3) operation:

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

int[] id

datastructur.es

22 of 73

QuickFindDS

public class QuickFindDS implements DisjointSets {

private int[] id;

public boolean isConnected(int p, int q) {

return id[p] == id[q];

}

public void connect(int p, int q) {

int pid = id[p];

int qid = id[q];

for (int i = 0; i < id.length; i++) {

if (id[i] == pid) {

id[i] = qid;

}

}...

Very fast: Two array accesses: Θ(1)

Relatively slow: N+2 to 2N+2 array accesses: Θ(N)

public QuickFindDS(int N) {

id = new int[N];

for (int i = 0; i < N; i++)

id[i] = i;

}

datastructur.es

23 of 73

Performance Summary

QuickFindDS is too slow for practical use: Connecting two items takes N time.

Instead, let’s try something more radical.

Implementation	constructor	connect	isConnected
ListOfSetsDS	Θ(N)	O(N)	O(N)
QuickFindDS	Θ(N)	Θ(N)	Θ(1)

datastructur.es

24 of 73

Quick Union

datastructur.es

25 of 73

Improving the Connect Operation

Approach zero: Represent everything as boxes and lines. Overly complicated.

ListOfSets: Represent everything as connected components. Represented connected components as list of sets of integers.

QuickFind: Represent everything as connected components. Represented connected components as a list of integers, where value = id.

0

4

1

2

3

5

6

??? in Java instance variables.

{ 0, 1, 2, 4 }, {3, 5}, {6}

[{0, 1, 2, 4}, {3, 5}, {6}]

[2, 2, 2, 3, 2, 3, 6]

List<Set<Integer>>

int[]

datastructur.es

26 of 73

Improving the Connect Operation

QuickFind: Represent everything as connected components. Represented connected components as a list of integers where value = id.

Bad feature: Connecting two sets is slow!

Next approach (QuickUnion): We will still represent everything as connected components, and we will still represent connected components as a list of integers. However, values will be chosen so that connect is fast.

{ 0, 1, 2, 4 }, {3, 5}, {6}

[2, 2, 2, 3, 2, 3, 6]

int[]

datastructur.es

27 of 73

Improving the Connect Operation

Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?

0	0	0	3	0	3	6

0 1 2 3 4 5 6

3	3	3	3	3	3	6

0 1 2 3 4 5 6

id

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

datastructur.es

28 of 73

Improving the Connect Operation (Your Answer)

Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?

Suggestion, use pointers!

0 1 2 3 4 5 6

0 1 2 3 4 5 6

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

id

ZERO

THREE

SIX

THREE

SIX

datastructur.es

29 of 73

Improving the Connect Operation

Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?

Idea: Assign each item a parent (instead of an id). Results in a tree-like shape.

An innocuous sounding, seemingly arbitrary solution.
Unlocks a pretty amazing universe of math that we won’t discuss.

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

-1	0	1	-1	0	3	-1

0 1 2 3 4 5 6

parent�

5

3

Note: The optional textbook has an item’s parent as itself instead of -1 for root items.

datastructur.es

30 of 73

Improving the Connect Operation

connect(5, 2)

How should we change the parent list to handle this connect operation?

If you’re not sure where to start, consider: why can’t we just set id[5] to 2?

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

5

3

parent

-1	0	1	-1	0	3	-1

0 1 2 3 4 5 6

datastructur.es

31 of 73

Improving the Connect Operation (Your Answer)

connect(5, 2)

One possibility, set id[3] = 2
Set id[3] = 0

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

5

3

parent

-1	0	1	2	0	2	-1

0 1 2 3 4 5 6

datastructur.es

32 of 73

Improving the Connect Operation

connect(5, 2)

Find root(5). // returns 3
Find root(2). // returns 0
Set root(5)’s value equal to root(2).

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

5

3

parent

-1	0	1	-1	0	3	-1

0 1 2 3 4 5 6

datastructur.es

33 of 73

Improving the Connect Operation

connect(5, 2)

Find root(5). // returns 3
Find root(2). // returns 0
Set root(5)’s value equal to root(2).

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

5

3

parent

-1	0	1	0	0	3	-1

0 1 2 3 4 5 6

datastructur.es

34 of 73

Set Union Using Rooted-Tree Representation

connect(5, 2)

Make root(5) into a child of root(2).

What are the potential performance issues with this approach?

Compared to QuickFind, we have to climb up a tree.

{0, 1, 2, 4, 3, 5} {6}

2

1

4

0

5

3

6

0 is the “root” of this set.

parent

-1	0	1	0	0	3	-1

0 1 2 3 4 5 6

datastructur.es

35 of 73

Set Union Using Rooted-Tree Representation

connect(5, 2)

Make root(5) into a child of root(2).

What are the potential performance issues with this approach?

Tree can get too tall! root(x) becomes expensive.

{0, 1, 2, 4, 3, 5} {6}

2

1

4

0

5

3

6

0 is the “root” of this set.

parent

-1	0	1	0	0	3	-1

0 1 2 3 4 5 6

datastructur.es

36 of 73

The Worst Case

If we always connect the first item’s tree below the second item’s tree, we can end up with a tree of height M after M operations:

connect(4, 3)
connect(3, 2)
connect(2, 1)
connect(1, 0)

For N items, what’s the worst case runtime…

For connect(p, q)?
For isConnected(p, q)?

3

2

1

0

4

datastructur.es

37 of 73

The Worst Case

If we always connect the first item’s tree below the second item’s tree, we can end up with a tree of height M after M operations:

connect(4, 3)
connect(3, 2)
connect(2, 1)
connect(1, 0)

For N items, what’s the worst case runtime…

For connect(p, q)? Θ(N)
For isConnected(p, q)? Θ(N)

3

2

1

0

4

datastructur.es

38 of 73

QuickUnionDS

public class QuickUnionDS implements DisjointSets {

private int[] parent;

public QuickUnionDS(int N) {

parent = new int[N];

for (int i = 0; i < N; i++)

{ parent[i] = -1; }

}

private int find(int p) {

int r = p;

while (parent[r] >= 0)

{ r = parent[r]; }

return r;

}

For N items, this means a worst case runtime of Θ(N).

public boolean isConnected(int p, int q) {

return find(p) == find(q);

}

{

int i = find(p);

int j = find(q);

parent[i] = j;

}

Here the find operation is the same as the “root(x)” idea we had in earlier slides.

datastructur.es

39 of 73

Performance Summary

QuickFindDS defect: QuickFindDS is too slow: Connecting takes Θ(N) time.

QuickUnion defect: Trees can get tall. Results in potentially even worse performance than QuickFind if tree is imbalanced.

Observation: Things would be fine if we just kept our tree balanced.

Implementation	Constructor	connect	isConnected
ListOfSetsDS	Θ(N)	O(N)	O(N)
QuickFindDS	Θ(N)	Θ(N)	Θ(1)
QuickUnionDS	Θ(N)	O(N)	O(N)

Using O because runtime can be between constant and linear.

datastructur.es

40 of 73

Weighted Quick Union

datastructur.es

41 of 73

A Choice of Two Roots: http://yellkey.com/reveal

Suppose we are trying to connect(2, 5). We have two choices:

Make 5’s root into a child of 2’s root.
Make 2’s root into a child of 5’s root.

Which is the better choice?

2

1

4

0

5

3

2

1

4

0

5

3

2

1

4

0

5

3

+

datastructur.es

42 of 73

A Choice of Two Roots

Suppose we are trying to connect(2, 5). We have two choices:

Make 5’s root into a child of 2’s root.
Make 2’s root into a child of 5’s root.

Which is the better choice?

2

1

4

0

5

3

2

1

4

0

5

3

2

1

4

0

5

3

+

Height: 2

Height: 3

datastructur.es

43 of 73

A Choice of Two Roots

Suppose we are trying to connect(2, 5). We have two choices:

Make 5’s root into a child of 2’s root.
Make 2’s root into a child of 5’s root.

Which is the better choice?

2

1

5

4

0

3

2

1

4

0

5

3

2

1

4

0

5

3

+

Height: 2

Height: 3

datastructur.es

44 of 73

Weighted QuickUnion: http://yellkey.com/society

Modify quick-union to avoid tall trees.

Track tree size (number of elements).
New rule: Always link root of smaller tree to larger tree.

New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?

parent[3]
parent[0]
parent[8]
parent[6]

1

2

0

4

6

5

3

8

9

7

-1	0	0	0	0	0	-1	6	6	8

0 1 2 3 4 5 6 7 8 9

parent

datastructur.es

45 of 73

Improvement #1: Weighted QuickUnion

Modify quick-union to avoid tall trees.

Track tree size (number of elements).
New rule: Always link root of smaller tree to larger tree.

New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?

parent[3]
parent[0]
parent[8]
parent[6]

-1	0	0	0	0	0	0	6	6	8

0 1 2 3 4 5 6 7 8 9

parent

1

2

0

4

6

5

3

8

9

7

Note: The rule I picked is based on weight, not height. We’ll talk about why soon.

datastructur.es

46 of 73

Implementing WeightedQuickUnion

Minimal changes needed:

Use parent[] array as before.
isConnected(int p, int q) requires no changes.
connect(int p, int q) needs to somehow keep track of sizes.

See the Disjoint Sets lab for the full details.
Two common approaches:

Use values other than -1 in parent array for root nodes to track size.
Create a separate size array.

-6	0	0	0	0	0	-4	6	6	8

0 1 2 3 4 5 6 7 8 9

parent

10	1	1	1	1	1	4	1	2	1

0 1 2 3 4 5 6 7 8 9

size

datastructur.es

47 of 73

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

N	H
1	0

datastructur.es

48 of 73

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

N	H
1	0
2	1

datastructur.es

49 of 73

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N	H
1	0
2	1

datastructur.es

50 of 73

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N	H
1	0
2	1
4	2

datastructur.es

51 of 73

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N	H
1	0
2	1
4	2

4

5

datastructur.es

52 of 73

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N	H
1	0
2	1
4	2

4

5

6

7

datastructur.es

53 of 73

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N	H
1	0
2	1
4	2

4

5

6

7

datastructur.es

54 of 73

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N	H
1	0
2	1
4	2
8	3

4

5

6

7

datastructur.es

55 of 73

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

Worst case tree height is Θ(log N).

0

1

2

3

N	H
1	0
2	1
4	2
8	3
16	4

4

5

6

7

8

9

10

11

12

13

14

15

datastructur.es

56 of 73

Performance Summary

QuickUnion’s runtimes are O(H), and WeightedQuickUnionDS height is given by H = O(log N). Therefore connect and isConnected are both O(log N).

By tweaking QuickUnionDS we’ve achieved logarithmic time performance.

Fast enough for all practical problems.

Implementation	Constructor	connect	isConnected
ListOfSetsDS	Θ(N)	O(N)	O(N)
QuickFindDS	Θ(N)	Θ(N)	Θ(1)
QuickUnionDS	Θ(N)	O(N)	O(N)
WeightedQuickUnionDS	Θ(N)	O(log N)	O(log N)

datastructur.es

57 of 73

Why Weights Instead of Heights?

We used the number of items in a tree to decide upon the root.

Why not use the height of the tree?

Worst case performance for HeightedQuickUnionDS is asymptotically the same! Both are Θ(log(N)).
Resulting code is more complicated with no performance gain.

1

2

0

4

6

5

3

8

9

7

+

1

2

0

4

6

5

3

8

9

7

datastructur.es

58 of 73

Path Compression

(CS170 Spoiler)

datastructur.es

59 of 73

What We’ve Achieved

Performing M operations on a DisjointSet object with N elements:

For our naive implementation, runtime is O(MN).
For our best implementation, runtime is O(N + M log N).
For N = 10⁹and M = 10⁹, difference is 30 years vs. 6 seconds.

Key point: Good data structure unlocks solutions to problems that could otherwise not be solved!

Good enough for all practical uses, but could we theoretically do better?

Implementation	Constructor	connect	isConnected
ListOfSetsDS	Θ(N)	O(N)	O(N)
WeightedQuickUnionDS	Θ(N)	O(log N)	O(log N)

datastructur.es

60 of 73

Suppose we have a ListOfSetsDS implementation of Disjoint Sets.

Suppose that it has 1000 items, i.e. N = 1000.

Suppose we perform a total of 150 connect operations and 212 isConnected operations.

M = 150 + 212 = 362 operations

So when we say O(NM), we’re saying it’ll take no more than 1000 * 362 units of time (in some arbitrary unit of time).

This is a bit informal. O is really about asymptotics, i.e. behavior for very large N and M, not specific N and Ms that we pick.

datastructur.es

61 of 73

170 Spoiler: Path Compression: A Clever Idea

Below is the topology of the worst case if we use WeightedQuickUnion.

Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.

Additional cost is insignificant (same order of growth).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

datastructur.es

62 of 73

Path Compression: Theoretical Performance (Bonus)

Path compression results in a union/connected operations that are very very close to amortized constant time.

M operations on N nodes is O(N + M lg* N).
A tighter bound: O(N + M α(N)), where α is the inverse Ackermann function.
The inverse ackermann function is less than 5 for all practical inputs!

See “Efficiency of a Good But Not Linear Set Union Algorithm.”
Written by Bob Tarjan while at UC Berkeley in 1975.

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

N	α(N)
1	0
...	1
...	2
...	3
...	4
	5

datastructur.es

63 of 73

170 Spoiler: Path Compression: A Clever Idea

Below is the topology of the worst case if we use WeightedQuickUnion

Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.

Additional cost is insignificant (same order of growth).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

datastructur.es

64 of 73

Path Compression: Another Clever Idea

Below is the topology of the worst case if we use WeightedQuickUnion

Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.

Additional cost is insignificant (same order of growth).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

datastructur.es

65 of 73

Path Compression: Another Clever Idea

Draw the tree after we call isConnected(14, 13).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

datastructur.es

66 of 73

Path Compression: Another Clever Idea

Draw the tree after we call isConnected(14, 13).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

datastructur.es

67 of 73

Path Compression: Another Clever Idea

Draw the tree after we call isConnected(14, 13).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

datastructur.es

68 of 73

Path Compression: Another Clever Idea

Draw the tree after we call isConnected(14, 13).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

datastructur.es

69 of 73

170 Spoiler: Path Compression: A Clever Idea

Path compression results in a union/connected operations that are very very close to amortized constant time (amortized constant means “constant on average”).

M operations on N nodes is O(N + M lg* N) - you will see this in CS170.
lg* is less than 5 for any realistic input.

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

N	lg* N
1	0
2	1
4	2
16	3
65536	4
2⁶⁵⁵³⁶	5

2¹⁶

datastructur.es

70 of 73

Path Compression: Theoretical Performance (Bonus)

Path compression results in a union/connected operations that are very very close to amortized constant time.

M operations on N nodes is O(N + M lg* N).
A tighter bound: O(N + M α(N)), where α is the inverse Ackermann function.
The inverse ackermann function is less than 5 for all practical inputs!

See “Efficiency of a Good But Not Linear Set Union Algorithm.”
Written by Bob Tarjan while at UC Berkeley in 1975.

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

N	α(N)
1	0
...	1
...	2
...	3
...	4
	5

datastructur.es

71 of 73

A Summary of Our Iterative Design Process

And we’re done! The end result of our iterative design process is the standard way disjoint sets are implemented today - quick union and path compression.

The ideas that made our implementation efficient:

Represent sets as connected components (don’t track individual connections).

ListOfSetsDS: Store connected components as a List of Sets (slow, complicated).
QuickFindDS: Store connected components as set ids.
QuickUnionDS: Store connected components as parent ids.

WeightedQuickUnionDS: Also track the size of each set, and use size to decide on new tree root.

WeightedQuickUnionWithPathCompressionDS: On calls to connect and isConnected, set parent id to the root for all items seen.

datastructur.es

72 of 73

Performance Summary

Implementation	Runtime
ListOfSetsDS	O(NM)
QuickFindDS	Θ(NM)
QuickUnionDS	O(NM)
WeightedQuickUnionDS	O(N + M log N)
WeightedQuickUnionDSWithPathCompression	O(N + M α(N))

Runtimes are given assuming:

We have a DisjointSets object of size N.
We perform M operations, where an operation is defined as either a call to connected or isConnected.

datastructur.es

73 of 73

Citations

Nazca Lines: http://redicecreations.com/ul_img/24592nazca_bird.jpg

The proof of the inverse ackermann runtime for disjoint sets is given here:

http://www.uni-trier.de/fileadmin/fb4/prof/INF/DEA/Uebungen_LVA-Ankuendigungen/ws07/KAuD/effi.pdf

as originally proved by Tarjan here at UC Berkeley in 1975.

datastructur.es