1 of 76

Disjoint Sets

1

Lecture 14

CS61B, Spring 2023 @ UC Berkeley

Josh Hug and Justin Yokota

2 of 76

Disjoint Sets API

Lecture 14, CS61B, Spring 2023

Introduction to Disjoint Sets

  • Disjoint Sets API
  • Tracking Connected Components

Implementations

  • List of Sets
  • Quick Find
  • Quick Union
  • Weighted Quick Union
  • WQU with Path Compression

3 of 76

Meta-goals of the Coming Lectures: Data Structure Refinement

Next couple of weeks: Deriving classic solutions to interesting problems, with an emphasis on how sets, maps, and priority queues are implemented.

Today: Deriving the “Disjoint Sets” data structure for solving the “Dynamic Connectivity” problem. We will see:

  • How a data structure design can evolve from basic to sophisticated.
  • How our choice of underlying abstraction can affect asymptotic runtime (using our formal Big-Theta notation) and code complexity.

4 of 76

The Disjoint Sets Data Structure

The Disjoint Sets data structure has two operations:

  • connect(x, y): Connects x and y.
  • isConnected(x, y): Returns true if x and y are connected. Connections can be transitive, i.e. they don’t need to be direct.

Example:

  • connect(China, Vietnam)
  • isConnected(USA, Mongolia)? false
  • isConnected(Vietnam, Mongolia)? true
  • connect(USA, Canada)
  • connect(China, Mongolia)
  • connect(China, USA)
  • isConnected(USA, Mongolia)? true

Hyperloop

5 of 76

The Disjoint Sets Data Structure

The Disjoint Sets data structure has two operations:

  • connect(x, y): Connects x and y.
  • isConnected(x, y): Returns true if x and y are connected. Connections can be transitive, i.e. they don’t need to be direct.

Useful for many purposes, e.g.:

  • Percolation theory:
    • Computational chemistry.
  • Implementation of other algorithms:
    • Kruskal’s algorithm.

Hyperloop

6 of 76

Disjoint Sets on Integers

To keep things simple, we’re going to:

  • Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
  • Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

6

7 of 76

Disjoint Sets on Integers

To keep things simple, we’re going to:

  • Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
  • Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

6

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

ds.connect(4, 2)

8 of 76

Disjoint Sets on Integers

To keep things simple, we’re going to:

  • Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
  • Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

6

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

ds.connect(4, 2)

ds.connect(4, 6)

9 of 76

Disjoint Sets on Integers

To keep things simple, we’re going to:

  • Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
  • Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

6

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

ds.connect(4, 2)

ds.connect(4, 6)

ds.connect(3, 6)

10 of 76

Disjoint Sets on Integers

To keep things simple, we’re going to:

  • Force all items to be integers instead of arbitrary data (e.g. 8 instead of USA).
  • Declare the number of items in advance, everything is disconnected at start.��

0

4

3

1

2

5

6

ds = DisjointSets(7)

ds.connect(0, 1)

ds.connect(1, 2)

ds.connect(0, 4)

ds.connect(3, 5)

ds.isConnected(2, 4): true

ds.isConnected(3, 0): false

ds.connect(4, 2)

ds.connect(4, 6)

ds.connect(3, 6)

ds.isConnected(3, 0): true

11 of 76

The Disjoint Sets Interface

Goal: Design an efficient DisjointSets implementation.

  • Number of elements N can be huge.
  • Number of method calls M can be huge.
  • Calls to methods may be interspersed (e.g. can’t assume it’s onlu connect operations followed by only isConnected operations).

public interface DisjointSets {

/** Connects two items P and Q. */

void connect(int p, int q);

/** Checks to see if two items are connected. */

boolean isConnected(int p, int q);

}

getBack()

get(int i)

deleteBack()

connect(int p, int q)

isConnected(int p, int q)

12 of 76

Tracking Connected Components

Lecture 14, CS61B, Spring 2023

Introduction to Disjoint Sets

  • Disjoint Sets API
  • Tracking Connected Components

Implementations

  • List of Sets
  • Quick Find
  • Quick Union
  • Weighted Quick Union
  • WQU with Path Compression

13 of 76

The Naive Approach

Naive approach:

  • Connecting two things: Record every single connecting line in some data structure.
  • Checking connectedness: Do some sort of (??) iteration over the lines to see if one thing can be reached from the other.

0

4

1

2

3

5

6

14 of 76

A Better Approach: Connected Components

Rather than manually writing out every single connecting line, only record the sets that each item belongs to.�

connect(0, 1)

connect(1, 2)

connect(0, 4)

connect(3, 5)

isConnected(2, 4): true

isConnected(3, 0): false

connect(4, 2)

connect(4, 6)

connect(3, 6)

isConnected(3, 0): true

{0}, {1}, {2}, {3}, {4}, {5}, {6}

{0, 1}, {2}, {3}, {4}, {5}, {6}

{0, 1, 2}, {3}, {4}, {5}, {6}

{0, 1, 2, 4}, {3}, {5}, {6}

{0, 1, 2, 4}, {3, 5}, {6}

{0, 1, 2, 4}, {3, 5}, {6}

{0, 1, 2, 4, 6}, {3, 5}

{0, 1, 2, 3, 4, 5, 6}

15 of 76

A Better Approach: Connected Components

For each item, its connected component is the set of all items that are connected to that item.

  • Naive approach: Record every single connecting line somehow.
  • Better approach: Model connectedness in terms of sets.
    • How things are connected isn’t something we need to know.
    • Only need to keep track of which connected component each item belongs to.

{ 0, 1, 2, 4 }, {3, 5}, {6}

0

4

1

2

3

5

6

Up next: We’ll consider how to do track set membership in Java.

16 of 76

Challenge: Pick Data Structures to Support Tracking of Sets

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

Assume elements are numbered from 0 to N-1.

Before connect(2, 3) operation: After connect(2, 3) operation:

17 of 76

List of Sets

Lecture 14, CS61B, Spring 2023

Introduction to Disjoint Sets

  • Disjoint Sets API
  • Tracking Connected Components

Implementations

  • List of Sets
  • Quick Find
  • Quick Union
  • Weighted Quick Union
  • WQU with Path Compression

18 of 76

Challenge: Pick Data Structures to Support Tracking of Sets (Your Idea)

Before connect(2, 3) operation: After connect(2, 3) operation:

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

Assume elements are numbered from 0 to N-1.

19 of 76

Challenge: Pick Data Structures to Support Tracking of Sets (Your Idea)

Before connect(2, 3) operation: After connect(2, 3) operation:

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

Idea #1: List of sets of integers, e.g. [{0, 1, 2, 4}, {3, 5}, {6}]

  • In Java: List<Set<Integer>>.
  • Very intuitive idea, but actually terrible!

20 of 76

Challenge: Pick Data Structures to Support Tracking of Sets (Your Idea)

If nothing is connected:

0

4

1

2

3

5

6

Idea #1: List of sets of integers, e.g. [{0}, {1}, {2}, {3}, {4}, {5}, {6}]

  • In Java: List<Set<Integer>>.
  • Very intuitive idea.
  • Requires iterating through all the sets to find anything. Complicated and slow!
    • Worst case: If nothing is connected, then isConnected(5, 6) requires iterating through N-1 sets to find 5, then N sets to find 6. Overall runtime of Θ(N).

21 of 76

QuickFind

Lecture 14, CS61B, Spring 2023

Introduction to Disjoint Sets

  • Disjoint Sets API
  • Tracking Connected Components

Implementations

  • List of Sets
  • Quick Find
  • Quick Union
  • Weighted Quick Union
  • WQU with Path Compression

22 of 76

Performance Summary

ListOfSetsDS is complicated and slow.

  • Operations are linear when number of connections are small.
    • Have to iterate over all sets.
  • Important point: By deciding to use a List of Sets, we have doomed ourselves to complexity and bad performance.

Implementation

constructor

connect

isConnected

ListOfSetsDS

Θ(N)

O(N)

O(N)

Worst case is Θ(N), but other cases may be better. We’ll say O(N) since O means “less than or equal”.

Constructor’s runtime has order of growth N no matter what, so Θ(N).

23 of 76

Next Approach: Array of Integers

4

4

4

5

4

5

6

0 1 2 3 4 5 6

5

5

5

5

5

5

6

0 1 2 3 4 5 6

Idea #2: list of integers where ith entry gives set number (a.k.a. “id”) of item i.

  • connect(p, q): Change entries that equal id[p] to id[q]

int[] id

Before connect(2, 3) operation: After connect(2, 3) operation:

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

int[] id

24 of 76

QuickFindDS

public class QuickFindDS implements DisjointSets {

private int[] id;

public boolean isConnected(int p, int q) {

return id[p] == id[q];

}

public void connect(int p, int q) {

int pid = id[p];

int qid = id[q];

for (int i = 0; i < id.length; i++) {

if (id[i] == pid) {

id[i] = qid;

}

}...

Very fast: Two array accesses: Θ(1)

Relatively slow: N+2 to 2N+2 array accesses: Θ(N)

public QuickFindDS(int N) {

id = new int[N];

for (int i = 0; i < N; i++)

id[i] = -1;

}

}

25 of 76

Performance Summary

QuickFindDS is too slow for practical use: Connecting two items takes N time.

  • Instead, let’s try something more radical.

Implementation

constructor

connect

isConnected

ListOfSetsDS

Θ(N)

O(N)

O(N)

QuickFindDS

Θ(N)

Θ(N)

Θ(1)

26 of 76

Quick Union

Lecture 14, CS61B, Spring 2023

Introduction to Disjoint Sets

  • Disjoint Sets API
  • Tracking Connected Components

Implementations

  • List of Sets
  • Quick Find
  • Quick Union
  • Weighted Quick Union
  • WQU with Path Compression

27 of 76

Improving the Connect Operation

Approach zero: Represent everything as boxes and lines. Overly complicated.

ListOfSets: Represent everything as connected components. Represented connected components as list of sets of integers.

QuickFind: Represent everything as connected components. Represented connected components as a list of integers, where value = id.

0

4

1

2

3

5

6

??? in Java instance variables.

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4 }, {3, 5}, {6}

[{0, 1, 2, 4}, {3, 5}, {6}]

[2, 2, 2, 3, 2, 3, 6]

List<Set<Integer>>

int[]

28 of 76

Improving the Connect Operation

QuickFind: Represent everything as connected components. Represented connected components as a list of integers where value = id.

  • Bad feature: Connecting two sets is slow!

Next approach (QuickUnion): We will still represent everything as connected components, and we will still represent connected components as a list of integers. However, values will be chosen so that connect is fast.

{ 0, 1, 2, 4 }, {3, 5}, {6}

[2, 2, 2, 3, 2, 3, 6]

int[]

29 of 76

Improving the Connect Operation

Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?

0

0

0

3

0

3

6

0 1 2 3 4 5 6

3

3

3

3

3

3

6

0 1 2 3 4 5 6

id

id

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

30 of 76

Improving the Connect Operation (Your Answer)

Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?

0

0

0

3

0

3

6

0 1 2 3 4 5 6

3

3

3

3

3

3

6

0 1 2 3 4 5 6

id

id

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

31 of 76

Improving the Connect Operation (Your Answer)

Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?

  • Suggestion, use pointers!

0 1 2 3 4 5 6

0 1 2 3 4 5 6

{ 0, 1, 2, 4 }, {3, 5}, {6}

{ 0, 1, 2, 4, 3, 5}, {6}

0

4

1

2

3

5

6

0

4

1

2

3

5

6

id

id

ZERO

THREE

SIX

THREE

THREE

SIX

32 of 76

Improving the Connect Operation

Hard question: How could we change our set representation so that combining two sets into their union requires changing one value?

  • Idea: Assign each item a parent (instead of an id). Results in a tree-like shape.
    • An innocuous sounding, seemingly arbitrary solution.
    • Unlocks a pretty amazing universe of math that we won’t discuss.

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

-1

0

1

-1

0

3

-1

0 1 2 3 4 5 6

parent�

5

3

Note: The optional textbook has an item’s parent as itself instead of -1 for root items.

33 of 76

Improving the Connect Operation

connect(5, 2)

  • How should we change the parent list to handle this connect operation?
    • If you’re not sure where to start, consider: why can’t we just set id[5] to 2?

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

5

3

parent

-1

0

1

-1

0

3

-1

0 1 2 3 4 5 6

34 of 76

Improving the Connect Operation (Your Answer)

connect(5, 2)

  • How should we change the parent list to handle this connect operation?

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

5

3

parent

-1

0

1

-1

0

3

-1

0 1 2 3 4 5 6

35 of 76

Improving the Connect Operation

connect(5, 2)

  • Find root(5). // returns 3
  • Find root(2). // returns 0
  • Set root(5)’s value equal to root(2).

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

5

3

parent

-1

0

1

-1

0

3

-1

0 1 2 3 4 5 6

36 of 76

Improving the Connect Operation

connect(5, 2)

  • Find root(5). // returns 3
  • Find root(2). // returns 0
  • Set root(5)’s value equal to root(2).

{0, 1, 2, 4} {3, 5} {6}

2

1

4

0

6

0 is the “root” of this set.

5

3

parent

-1

0

1

0

0

3

-1

0 1 2 3 4 5 6

37 of 76

Set Union Using Rooted-Tree Representation

connect(5, 2)

  • Make root(5) into a child of root(2).

What are the potential performance issues with this approach?

  • Compared to QuickFind, we have to climb up a tree.

{0, 1, 2, 4, 3, 5} {6}

2

1

4

0

5

3

6

0 is the “root” of this set.

parent

-1

0

1

0

0

3

-1

0 1 2 3 4 5 6

38 of 76

The Worst Case

If we always connect the first item’s tree below the second item’s tree, we can end up with a tree of height M after M operations:

  • connect(4, 3)
  • connect(3, 2)
  • connect(2, 1)
  • connect(1, 0)

For N items, what’s the worst case runtime…

  • For connect(p, q)?
  • For isConnected(p, q)?

3

2

1

0

4

39 of 76

The Worst Case

If we always connect the first item’s tree below the second item’s tree, we can end up with a tree of height M after M operations:

  • connect(4, 3)
  • connect(3, 2)
  • connect(2, 1)
  • connect(1, 0)

For N items, what’s the worst case runtime…

  • For connect(p, q)? Θ(N)
  • For isConnected(p, q)? Θ(N)

3

2

1

0

4

40 of 76

QuickUnionDS

public class QuickUnionDS implements DisjointSets {

private int[] parent;

public QuickUnionDS(int N) {

parent = new int[N];

for (int i = 0; i < N; i++)

{ parent[i] = -1; }

}

private int find(int p) {

int r = p;

while (parent[r] >= 0)

{ r = parent[r]; }

return r;

}

For N items, this means a worst case runtime of Θ(N).

public boolean isConnected(int p, int q) {

return find(p) == find(q);

}

{

int i = find(p);

int j = find(q);

parent[i] = j;

}

Here the find operation is the same as the “root(x)” idea we had in earlier slides.

41 of 76

Performance Summary

QuickFindDS defect: QuickFindDS is too slow: Connecting takes Θ(N) time.

QuickUnion defect: Trees can get tall. Results in potentially even worse performance than QuickFind if tree is imbalanced.

  • Observation: Things would be fine if we just kept our tree balanced.

Implementation

Constructor

connect

isConnected

ListOfSetsDS

Θ(N)

O(N)

O(N)

QuickFindDS

Θ(N)

Θ(N)

Θ(1)

QuickUnionDS

Θ(N)

O(N)

O(N)

Using O because runtime can be between constant and linear.

42 of 76

Weighted Quick Union

Lecture 14, CS61B, Spring 2023

Introduction to Disjoint Sets

  • Disjoint Sets API
  • Tracking Connected Components

Implementations

  • List of Sets
  • Quick Find
  • Quick Union
  • Weighted Quick Union
  • WQU with Path Compression

43 of 76

A Choice of Two Roots: yellkey.com/offer

Suppose we are trying to connect(2, 5). We have two choices:

  1. Make 5’s root into a child of 2’s root.
  2. Make 2’s root into a child of 5’s root.

Which is the better choice?

2

1

4

0

5

3

2

1

4

0

5

3

2

1

4

0

5

3

+

44 of 76

A Choice of Two Roots

Suppose we are trying to connect(2, 5). We have two choices:

  • Make 5’s root into a child of 2’s root.
  • Make 2’s root into a child of 5’s root.

Which is the better choice?

2

1

4

0

5

3

2

1

4

0

5

3

2

1

4

0

5

3

+

Height: 2

Height: 3

45 of 76

Possible Approach

One possible approach is to keep track of the height of every tree.

  • Link up shorter tree below the larger tree.
  • In case of a tie, break tie arbitrarily.

Unfortunately, tracking tree height is kind of annoying.

Interestingly, tracking the tree’s “size”, a.k.a. “weight” works just as well asymptotically.

  • Size and weight both mean the total number of items in that tree.

46 of 76

Weighted QuickUnion: http://yellkey.com/class

Modify quick-union to avoid tall trees.

  • Track tree size (number of elements).
  • New rule: Always link root of smaller tree to larger tree.

New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?

  1. parent[3]
  2. parent[0]
  3. parent[8]
  4. parent[6]

1

2

0

4

6

5

3

8

9

7

-1

0

0

0

0

0

-1

6

6

8

0 1 2 3 4 5 6 7 8 9

parent

47 of 76

Improvement #1: Weighted QuickUnion

Modify quick-union to avoid tall trees.

  • Track tree size (number of elements).
  • New rule: Always link root of smaller tree to larger tree.

New rule: If we call connect(3, 8), which entry (or entries) of parent[] changes?

  • parent[3]
  • parent[0]
  • parent[8]
  • parent[6]

-1

0

0

0

0

0

0

6

6

8

0 1 2 3 4 5 6 7 8 9

parent

1

2

0

4

6

5

3

8

9

7

Note: The rule I picked is based on weight, not height. We’ll talk about why soon.

48 of 76

Implementing WeightedQuickUnion

Minimal changes needed:

  • Use parent[] array as before.
  • isConnected(int p, int q) requires no changes.
  • connect(int p, int q) needs to somehow keep track of sizes.
    • See the old Disjoint Sets lab for the full details.
    • Two common approaches:
      • Replace -1 with -weight for roots (top approach).
      • Create a separate size array (bottom approach).

-2

-1

-1

-1

-1

0

-4

6

6

8

0 1 2 3 4 5 6 7 8 9

parent

2

1

1

1

1

1

4

1

2

1

0 1 2 3 4 5 6 7 8 9

size

An older version of this slide had incorrect values in the two arrays.

49 of 76

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

N

H

1

0

50 of 76

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

N

H

1

0

2

1

51 of 76

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N

H

1

0

2

1

52 of 76

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N

H

1

0

2

1

4

2

53 of 76

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N

H

1

0

2

1

4

2

4

5

54 of 76

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N

H

1

0

2

1

4

2

4

5

6

7

55 of 76

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N

H

1

0

2

1

4

2

4

5

6

7

56 of 76

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

0

1

2

3

N

H

1

0

2

1

4

2

8

3

4

5

6

7

57 of 76

Weighted Quick Union Performance

Let’s consider the worst case where the tree height grows as fast as possible.

  • Worst case tree height is Θ(log N).

0

1

2

3

N

H

1

0

2

1

4

2

8

3

16

4

4

5

6

7

8

9

10

11

12

13

14

15

58 of 76

Performance Summary

QuickUnion’s runtimes are O(H), and WeightedQuickUnionDS height is given by H = O(log N). Therefore connect and isConnected are both O(log N).

By tweaking QuickUnionDS we’ve achieved logarithmic time performance.

  • Fast enough for all practical problems.

Implementation

Constructor

connect

isConnected

ListOfSetsDS

Θ(N)

O(N)

O(N)

QuickFindDS

Θ(N)

Θ(N)

Θ(1)

QuickUnionDS

Θ(N)

O(N)

O(N)

WeightedQuickUnionDS

Θ(N)

O(log N)

O(log N)

59 of 76

Why Weights Instead of Heights?

We used the number of items in a tree to decide upon the root.

  • Why not use the height of the tree?
    • Worst case performance for HeightedQuickUnionDS is asymptotically the same! Both are Θ(log(N)).
    • Resulting code is more complicated with no performance gain.

1

2

0

4

6

5

3

8

9

7

+

1

2

0

4

6

5

3

8

9

7

60 of 76

WQU with Path Compression

Lecture 14, CS61B, Spring 2023

Introduction to Disjoint Sets

  • Disjoint Sets API
  • Tracking Connected Components

Implementations

  • List of Sets
  • Quick Find
  • Quick Union
  • Weighted Quick Union
  • WQU with Path Compression

61 of 76

What We’ve Achieved

Performing M operations on a DisjointSet object with N elements:

  • For our naive implementation, runtime is O(N + MN) or just O(MN).
  • For our best implementation, runtime is O(N + M log N).
  • For N = 109 and M = 109, difference is 30 years vs. 6 seconds.
    • Key point: Good data structure unlocks solutions to problems that could otherwise not be solved!
  • Good enough for all practical uses, but could we theoretically do better?

Implementation

Constructor

connect

isConnected

ListOfSetsDS

Θ(N)

O(N)

O(N)

WeightedQuickUnionDS

Θ(N)

O(log N)

O(log N)

O(N) cost for constructor.

62 of 76

170 Spoiler: Path Compression: A Clever Idea

Below is the topology of the worst case if we use WeightedQuickUnion.

Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.

  • Additional cost is insignificant (same order of growth).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

63 of 76

170 Spoiler: Path Compression: A Clever Idea

Below is the topology of the worst case if we use WeightedQuickUnion.

Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.

  • Additional cost is insignificant (same order of growth).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

64 of 76

170 Spoiler: Path Compression: A Clever Idea

Below is the topology of the worst case if we use WeightedQuickUnion.

Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.

  • Additional cost is insignificant (same order of growth).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

65 of 76

170 Spoiler: Path Compression: A Clever Idea

Below is the topology of the worst case if we use WeightedQuickUnion.

Clever idea: When we do isConnected(15, 10), tie all nodes seen to the root.

  • Additional cost is insignificant (same order of growth).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

66 of 76

Path Compression: Another Clever Idea

Draw the tree after we call isConnected(14, 13).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

67 of 76

Path Compression: Another Clever Idea

Draw the tree after we call isConnected(14, 13).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

68 of 76

Path Compression: Another Clever Idea

Draw the tree after we call isConnected(14, 13).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

69 of 76

Path Compression: Another Clever Idea

Draw the tree after we call isConnected(14, 13).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

70 of 76

Intuition

By compressing the tree with each union and isConnected call, we keep the tree nice and short.

  • As number of nodes N grows, our tree tends to get taller.
  • As number of operations M grows, our tree tends to get shorter.
    • For enough operations tree height will shrink to 1.

71 of 76

Intuition

By compressing the tree with each union and isConnected call, we keep the tree nice and short.

Note: The tree we started with in the exercise you completed is impossible to generate if we’re using path compression!

  • The structure in red is impossible (try to convince yourself if you’d like).

15

11

5

12

13

6

1

7

14

8

2

9

10

3

0

4

72 of 76

CS 170 Analysis of Disjoint Sets

In CS170, you’ll show that each isConnected or connect operation takes on average lg* N time because the tree is kept so compressed.

  • lg*: How many times you need to press the log2 button on a calculator before you get to a number that is 1 or less. Example: http://joshh.ug/logstar/demo.html
  • M operations on N nodes takes O(M lg* N) time for large M.
  • lg* is less than or equal to 5 for any realistic input.

15

11

5

12

1

2

9

10

3

0

4

N

lg* N

1

0

2

1

4

2

16

3

65536

4

265536

5

216

73 of 76

Even More Careful Analysis

You can provide an even tighter bound, showing that each operation takes on average α(N) time.

  • α is the inverse Ackermann function.
  • See “Efficiency of a Good But Not Linear Set Union Algorithm.”
    • Written by Bob Tarjan while at UC Berkeley in 1975.

N

α(N)

1

0

...

1

...

2

...

3

...

4

5

15

11

5

12

1

2

9

10

3

0

4

74 of 76

A Summary of Our Iterative Design Process

And we’re done! The end result of our iterative design process is the standard way disjoint sets are implemented today - quick union and path compression.

The ideas that made our implementation efficient:

  • Represent sets as connected components (don’t track individual connections).
    • ListOfSetsDS: Store connected components as a List of Sets (slow, complicated).
    • QuickFindDS: Store connected components as set ids.
    • QuickUnionDS: Store connected components as parent ids.
      • WeightedQuickUnionDS: Also track the size of each set, and use size to decide on new tree root.
        • WeightedQuickUnionWithPathCompressionDS: On calls to connect and isConnected, set parent id to the root for all items seen.

75 of 76

Performance Summary

Implementation

Runtime

ListOfSetsDS

O(NM)

QuickFindDS

Θ(NM)

QuickUnionDS

O(NM)

WeightedQuickUnionDS

O(M log N)

WeightedQuickUnionDSWithPathCompression

O(M α(N))

Runtimes are given assuming:

  • We have a DisjointSets object of size N.
  • We perform M operations, where an operation is defined as either a call to connected or isConnected.

76 of 76

Citations

Nazca Lines: http://redicecreations.com/ul_img/24592nazca_bird.jpg

The proof of the inverse ackermann runtime for disjoint sets is given here:

http://www.uni-trier.de/fileadmin/fb4/prof/INF/DEA/Uebungen_LVA-Ankuendigungen/ws07/KAuD/effi.pdf

as originally proved by Tarjan here at UC Berkeley in 1975.