More Quicksort, Quick Select, Stability
1
Lecture 32 (Sorting 3)
CS61B, Fall 2023 @ UC Berkeley
Slides credit: Josh Hug
Philosophies for Avoiding Worst Case Quicksort Behavior
Lecture 31, CS61B, Fall 2023
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Partition Sort, a.k.a. Quicksort
Quicksorting N items: (Demo)
32
15
2
17
19
26
41
17
17
partition(32)
15
2
17
19
26
17
17
32
41
<= 32
>= 32
in its place
Run time is Θ(N log N) in the best case, Θ(N2) in the worst case, and Θ(N log N) in the average case.
Avoiding the Worst Case: Question from Last Time
If pivot always lands somewhere “good”, Quicksort is Θ(N log N). However, the very rare Θ(N2) cases do happen in practice, e.g.
What can we do to avoid worst case behavior?
Recall, our version of Quicksort has the following properties:
6
8
3
1
2
7
4
3
1
2
4
6
8
7
Avoiding the Worst Case: My Answers
What can we do to avoid running into the worst case for QuickSort?
Four philosophies:
1. Randomness: Pick a random pivot or shuffle before sorting.
2. Smarter pivot selection: Calculate or approximate the median.
3. Introspection: Switch to a safer sort if recursion goes to deep.
4. Preprocess the array: Could analyze array to see if Quicksort will be slow. No obvious way to do this, though (can’t just check if array is sorted, almost sorted arrays are almost slow).
Philosophy 1: Randomness (My Preferred Approach)
If pivot always lands somewhere “good”, Quicksort is Θ(N log N). However, the very rare Θ(N2) cases do happen in practice, e.g.
Dealing with bad ordering:
The second strategy requires care in partitioning code to avoid Θ(N2) behavior on arrays of duplicates.
Philosophy 2a: Smarter Pivot Selection (constant time pivot pick)
Randomness is necessary for best Quicksort performance! For any pivot selection procedure that is:
Dangerous input
7
2
3
4
6
1
8
5
The resulting Quicksort has a family of dangerous inputs that an adversary could easily generate.
Philosophy 2b: Smarter Pivot Selection (linear time pivot pick)
Could calculate the actual median in linear time.
Raises interesting question though: How do you compute the median of an array? Will talk about how to do this later today.
Philosophy 3: Introspection
Can also simply watch your recursion depth.
Perfectly reasonable approach, though not super common in practice.
Sorting Summary (so far)
Listed by mechanism:
Listed by memory and runtime:
| Memory | Time | Notes |
Heapsort | Θ(1) | Θ(N log N) | Bad caching (61C) |
Insertion | Θ(1) | Θ(N2) | Θ(N) if almost sorted |
Mergesort | Θ(N) | Θ(N log N) | |
Random Quicksort | Θ(log N) expected | Θ(N log N) expected | Fastest sort |
Quicksort Variant Experiments
Lecture 31, CS61B, Fall 2023
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Quicksort Flavors
We said Quicksort is the fastest, but this is only true if we make the right decisions about:
Suppose we run a speed test of Mergesort vs. Quicksort from last time, which had:
We’ll call this Quicksort L3S
Quicksort vs. Mergesort
Quicksort didn’t do so well!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints |
Mergesort | N/A | N/A | N/A | 1.3 seconds |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 3.2 seconds |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
Tony Hoare’s In-place Partitioning Scheme
Tony originally proposed a scheme where two pointers walk towards each other.
(Note: The demo we'll show is not the exact scheme Tony used)
Using this partitioning scheme yields a very fast Quicksort.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
19
32
2
26
41
17
17
Input:
L
G
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
19
32
2
26
41
17
17
Input:
L
G
Hello, lovely 15.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
19
32
2
26
41
17
17
Input:
L
G
Grrr…… 19
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
19
32
2
26
41
17
17
Input:
L
G
I dislike 17.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
19
32
2
26
41
17
17
Input:
L
G
Time to swap.
Time to swap.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
32
2
26
41
17
19
Input:
L
G
Swapped.
Swapped.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
32
2
26
41
17
19
Input:
L
Immediate trouble.
G
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
32
2
26
41
17
19
Input:
L
G
Trouble here, too.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
32
2
26
41
17
19
Input:
L
G
Time to swap.
Time to swap.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
17
2
26
41
32
19
Input:
L
G
Swapped!
Swapped!
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
17
2
26
41
32
19
Input:
L
G
2 is cool.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
17
2
26
41
32
19
Input:
L
G
26 is grossly large.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
17
2
26
41
32
19
Input:
L
G
41 is fine.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
17
2
26
41
32
19
Input:
L
G
26 is fine.
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
17
2
26
41
32
19
Input:
L
G
2 is no good… also hi L!!
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
17
2
26
41
32
19
Input:
L
G
2 is no good… also hi L!!
Hoare Partitioning
Create L and G pointers at left and right ends.
17
15
17
17
2
26
41
32
19
Input:
L
G
Time to swap.
Time to swap.
Hoare Partitioning
Create L and G pointers at left and right ends.
2
15
17
17
17
26
41
32
19
Input:
L
G
Old pivot.
New pivot.
New pivot.
Swapped.
Swapped.
Quicksort vs. Mergesort
Using Tony Hoare’s two pointer scheme, Quicksort is better than mergesort!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints |
Mergesort | N/A | N/A | N/A | 1.3 seconds |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 3.2 seconds |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 0.9 seconds |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
What If We Don’t Want Randomness?
Our approach so far: Use randomness to avoid worst case behavior, but some people don’t like having randomness in their sorting routine.
Another approach: Use the median (or an approximation) as our pivot.
Four philosophies:
1. Randomness: Pick a random pivot or shuffle before sorting.
2. Smarter pivot selection: Calculate or approximate the median.
3. Introspection: Switch to a safer sort if recursion goes to deep.
4. Try to cheat: If the array is already sorted, don’t sort (this doesn’t work).
This is what we’ve been using.
Philosophy 2a: Smarter Pivot Selection (linear time pivot pick)
The best possible pivot is the median.
Obvious approach: Just calculate the actual median and use that as pivot.
Goal: Come up with an algorithm for finding the median of an array. Bonus points if your algorithm takes linear time.
Philosophy 2a: Smarter Pivot Selection (linear time pivot pick)
Goal: Come up with an algorithm for finding the median of an array. Bonus points if your algorithm takes linear time.
Median Identification
Is it possible to find the median in Θ(N) time?
Historical note: The authors of this paper include FOUR Turing Award winners (and Vaughan Pratt)
Let’s see how Exact Median Quicksort performs.
Quicksort vs. Mergesort
Quicksort using PICK to find the exact median (Quicksort PickTH) is terrible!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints | Worst Case |
Mergesort | N/A | N/A | N/A | 1.3 seconds | Θ(N log N) |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 3.2 seconds | Θ(N2) |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 0.9 seconds | Θ(N2) |
Quicksort PickTH | Exact Median | Tony Hoare | Exact Median | 4.9 seconds | Θ(N log N) |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
Quick Select (median finding)
Lecture 31, CS61B, Fall 2023
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
The Selection Problem
Computing the exact median would be great for picking an item to partition around. Gives us a “safe quick sort”.
However, it turns out that partitioning can be used to find the exact median.
Quick Select
Goal, find the median:
Partition, pivot lands at 2.
Now pivot lands at 6.
Partition right subproblem, median can’t be to the left!
Pivot lands at 4. Are we done?
550
14
10
330
5
4
817
913
14
10
330
550
5
4
817
913
6
5
9
550
14
10
330
817
913
9
550
14
6
10
5
330
817
913
14
10
330
5
4
4
4
5
10
14
330
5
4
4
4
5
Worst case performance?
What is the worst case performance for Quick Select? Give an array that causes this worst case (assuming we always pick leftmost item as pivot).
Worst case performance?
What is the worst case performance for Quick Select? Give an array that causes this worst case (assuming we always pick leftmost item as pivot).
Worst asymptotic performance Θ(N2) occurs if array is in sorted order.
[1 2 3 4 5 6 7 8 9 10 … N]
[1 2 3 4 5 6 7 8 9 10 … N]
[1 2 3 4 5 6 7 8 9 10 … N]
…
[1 2 3 4 5 … N/2 … N]
Expected Performance
On average, Quick Select will take Θ(N) time.
~N compares
~N/2 compares
~N/4 compares
On average, pivot ends up about halfway
N + N/2 + N/4 + … + 1 = Θ(N)
Quicksort With Quickselect?
Quicksort with PICK to find exact median was terrible.
What if we used Quickselect to find the exact median?
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints | Worst Case |
Mergesort | N/A | N/A | N/A | 1.3 seconds | Θ(N log N) |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 3.2 seconds | Θ(N2) |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 0.9 seconds | Θ(N2) |
Quicksort PickTH | Exact Median | Tony Hoare | Exact Median | 4.9 seconds | Θ(N log N) |
Sorting Stability
Lecture 31, CS61B, Fall 2023
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Sorting Summary (so far)
Listed by mechanism:
Listed by memory and runtime:
| Memory | # Compares | Notes |
Heapsort | Θ(1) | Θ(N log N) worst | Bad caching (61C) |
Insertion | Θ(1) | Θ(N2) worst | Θ(N) if almost sorted |
Mergesort | Θ(N) | Θ(N log N) worst | |
Random Quicksort | Θ(log N) (call stack) | Θ(N log N) expected | Fastest sort |
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Bas | 3 |
Jana | 3 |
Jouni | 3 |
Rosella | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Equivalent items don’t ‘cross over’ when being stably sorted.
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Jouni | 3 |
Rosella | 3 |
Bas | 3 |
Jana | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Sorting instability can be really annoying! Wanted students listed alphabetically by section.
Sorting Stability www.yellkey.com/TODO
Is insertion sort stable?
Is Quicksort stable?
S O R T E X A M P L E
S O R T E X A M P L E (0 swaps)
O S R T E X A M P L E (1 swap )
O R S T E X A M P L E (1 swap )
O R S T E X A M P L E (0 swaps)
E O R S T X A M P L E (4 swaps)
E O R S T X A M P L E (0 swaps)
A E O R S T X M P L E (6 swaps)
A E M O R S T X P L E (5 swaps)
A E M O P R S T X L E (4 swaps)
A E L M O P R S T X E (7 swaps)
A E E L M O P R S T X (8 swaps)
6
8
3
1
2
7
4
Sorting Stability
Is insertion sort stable?
Is Quicksort stable?
S O R T E X A M P L E
S O R T E X A M P L E (0 swaps)
O S R T E X A M P L E (1 swap )
O R S T E X A M P L E (1 swap )
O R S T E X A M P L E (0 swaps)
E O R S T X A M P L E (4 swaps)
E O R S T X A M P L E (0 swaps)
A E O R S T X M P L E (6 swaps)
A E M O R S T X P L E (5 swaps)
A E M O P R S T X L E (4 swaps)
A E L M O P R S T X E (7 swaps)
A E E L M O P R S T X (8 swaps)
6
7
3
1
2
8
3
2
3
3
1
6
8
7
3
1
2
3
6
7
8
Hoare partitioning.
Three array partitioning.
Stability
| Memory | # Compares | Notes | Stable? |
Heapsort | Θ(1) | Θ(N log N) | Bad caching (61C) | No |
Insertion | Θ(1) | Θ(N2) | Θ(N) if almost sorted | Yes |
Mergesort | Θ(N) | Θ(N log N) | | Yes |
Quicksort LTHS | Θ(log N) | Θ(N log N) expected | Fastest sort | No |
You can create a stable Quicksort (i.e. the version from the previous lecture). However, unstable partitioning schemes (like Hoare partitioning) tend to be faster. All reasonable partitioning schemes yield Θ(N log N) expected runtime, but with different constants.
This is due to the cost of tracking recursive calls by the computer, and is also an “expected” amount. The difference between log N and constant memory is trivial.
Arrays.sort
In Java, Arrays.sort(someArray) uses:
Why? See A level problems.
Arrays.sort
In Java, Arrays.sort(someArray) uses:
Why?
Arrays.sort
In Java, Arrays.sort(someArray) uses:
Why?
Optimizing Sorts
Additional tricks we can play:
Sounds of Sorting
Lecture 31, CS61B, Fall 2023
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Sounds of Sorting Algorithms (of 125 items)
Starts with selection sort: https://www.youtube.com/watch?v=kPRA0W1kECg
Insertion sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=0m9s
Quicksort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=0m38s
Mergesort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m05s
Heapsort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m28s
LSD sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m54s [coming in a future lecture]
MSD sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=2m10s [coming in a future lecture]
Shell’s sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=3m37s [bonus from an earlier lecture]
Questions to ponder (later… after class):