Instructor Lab 10
CS 61BL Summer 2023
Instructor Lab Check
Quicksort
Instructor Lab 10, CS61BL, SU23
Quicksort
Quicksort Performance
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Sorting So Far
Core ideas:
Quicksort:
Context for Quicksort’s Invention (Source)
1960: Tony Hoare was working on a crude automated translation program for Russian and English. �
... | ... |
beautiful | красивая |
... | ... |
cat | кошка |
... | ... |
“The cat wore a beautiful hat.”
Dictionary of D english words
N words
“Кошка носил красивая шапка.”
How would you do this?
Context for Quicksort’s Invention (Source)
Limitation at the time:
1960: Tony Hoare was working on a crude automated translation program for Russian and English.
Algorithm: N binary searches of D length dictionary.
... | ... |
beautiful | красивая |
... | ... |
cat | кошка |
... | ... |
Core Idea of Tony’s Sort: Partitioning http://www.yellkey.com/popular
To partition an array a[] on element x=a[i] is to rearrange a[] so that:
Which partitions are valid?
5
550
10
4
10
9
330
i
4
5
9
10
10
330
550
5
9
10
4
10
550
330
5
4
9
10
10
550
330
5
9
10
4
10
550
330
A.
C.
B.
D.
j
j
j
j
Called the ‘pivot’.
The Core Idea of Tony’s Sort: Partitioning
To partition an array a[] on element x=a[i] is to rearrange a[] so that:
Which partitions are valid?
5
550
10
4
10
9
330
i
4
5
9
10
10
330
550
5
9
10
4
10
550
330
5
4
9
10
10
550
330
5
9
10
4
10
550
330
A.
C.
B.
D.
j
j
j
j
Called the ‘pivot’.
Quicksort
Instructor Lab 10, CS61BL, SU23
Quicksort
Quicksort Performance
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Partition Sort, a.k.a. Quicksort
Observations:
5
3
2
1
8
4
6
7
3
2
1
4
7
8
6
5
3
2
1
4
5
7
8
6
5
2
1
3
4
5
6
7
8
5
Q: How would we use this operation for sorting?
Partition Sort, a.k.a. Quicksort
Quicksorting N items: (Demo)
32
15
2
17
19
26
41
17
17
partition(32)
15
2
17
19
26
17
17
32
41
<= 32
>= 32
in its place
Quicksort
Quicksort was the name chosen by Tony Hoare for partition sort.
How fast is Quicksort? Need to count number and difficulty of partition operations.
Theoretical analysis:
Quicksort
Instructor Lab 10, CS61BL, SU23
Quicksort
Quicksort Performance
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Best Case: Pivot Always Lands in the Middle
Only size 1 problems remain, so we’re done.
Best Case Runtime?
What is the best case runtime?
Only size 1 problems remain, so we’re done.
Best Case Runtime?
Only size 1 problems remain, so we’re done.
Total work at each level:
≈ N
≈N/2 + ≈N/2 = ≈N
≈N/4 * 4 = ≈N
Overall runtime:
Θ(NH) where H = Θ(log N)
so: Θ(N log N)
Worst Case: Pivot Always Lands at Beginning of Array http://www.yellkey.com/popular
Give an example of an array that would follow the pattern to the right.
What is the runtime Θ(∙)?
Worst Case: Pivot Always Lands at Beginning of Array
Give an example of an array that would follow the pattern to the right.
What is the runtime Θ(∙)?
Quicksort Performance
Theoretical analysis:
Compare this to Mergesort.
Recall that Θ(N log N) vs. Θ(N2) is a really big deal. So how can Quicksort be the fastest sort empirically? Because on average it is Θ(N log N).
Argument #1: 10% Case
Suppose pivot always ends up at least 10% from either edge (not to scale).
Work at each level: O(N)
N
N/10
9N/10
N/100
9N/100
9N/100
81N/100
Punchline: Even if you are unlucky enough to have a pivot that never lands anywhere near the middle, but at least always 10% from the edge, runtime is still O(N log N).
Argument #2: Quicksort is BST Sort
5
3
2
1
8
4
6
7
3
2
1
4
7
8
6
2
1
4
6
8
5
3
5
7
5
3
7
2
4
6
8
1
Key idea: compareTo calls are same for BST insert and Quicksort.
Reminder: Random insertion into a BST takes O(N log N) time.
Empirical Quicksort Runtimes
For N items:
Empirical histogram for quicksort compare counts (10,000 trials with N = 1000)
Lots of arrays take 12,000ish compares to sort with Quicksort.
A very small number take 15,000ish compares to sort with Quicksort.
Chance of taking 1,000,000ish compares is effectively zero.
Quicksort Performance
Theoretical analysis:
Compare this to Mergesort.
Why is it faster than mergesort?
With extremely high probability!!
For our pivot/partitioning strategies: Sorted or close to sorted.
Sorting Summary (so far)
Listed by mechanism:
Listed by memory and runtime:
| Memory | Time | Notes |
Heapsort | Θ(1) | Θ(N log N) | Bad caching (61C) |
Insertion | Θ(1) | Θ(N2) | Θ(N) if almost sorted |
Mergesort | Θ(N) | Θ(N log N) | |
Quicksort | Θ(log N) (call stack) | Θ(N log N) expected | Fastest sort |
Quicksort
Instructor Lab 10, CS61BL, SU23
Quicksort
Quicksort Performance
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Quicksort Performance
The performance of Quicksort (both order of growth and constant factors) depend critically on:
Bad choices can be very bad indeed, resulting in Θ(N2) runtimes.
Avoiding the Worst Case
If pivot always lands somewhere “good”, Quicksort is Θ(N log N). However, the very rare Θ(N2) cases do happen in practice, e.g.
What can we do to avoid worst case behavior?
Recall, our version of Quicksort has the following properties:
6
8
3
1
2
7
4
3
1
2
4
6
8
7
Avoiding the Worst Case
If pivot always lands somewhere “good”, Quicksort is Θ(N log N). However, the very rare Θ(N2) cases do happen in practice, e.g.
What can we do to avoid worst case behavior?
More Quicksort Origins
Amusingly, Quicksort was the wrong tool for the job. Two issues:
Quicksort
Instructor Lab 10, CS61BL, SU23
Quicksort
Quicksort Performance
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Avoiding the Worst Case: My Answers
What can we do to avoid running into the worst case for QuickSort?
Four philosophies:
1. Randomness: Pick a random pivot or shuffle before sorting.
2. Smarter pivot selection: Calculate or approximate the median.
3. Introspection: Switch to a safer sort if recursion goes to deep.
4. Preprocess the array: Could analyze array to see if Quicksort will be slow. No obvious way to do this, though (can’t just check if array is sorted, almost sorted arrays are almost slow).
Philosophy 1: Randomness
Pathological arrays:
Dealing with bad ordering:
The second strategy requires care in partitioning code to avoid Θ(N2) behavior on arrays of duplicates.
Philosophy 2a: Smarter Pivot Selection (constant time pivot pick)
Randomness is necessary for best Quicksort performance! For any pivot selection procedure that is:
Dangerous input
7
2
3
4
6
1
8
5
The resulting Quicksort has a family of dangerous inputs that an adversary could easily generate.
Philosophy 2b: Smarter Pivot Selection (linear time pivot pick)
Could calculate the actual median in linear time.
Raises interesting question though: How do you compute the median of an array? Will talk about how to do this later today.
Philosophy 3: Introspection
Can also simply watch your recursion depth.
Perfectly reasonable approach, though not super common in practice.
Sorting Summary (so far)
Listed by mechanism:
Listed by memory and runtime:
| Memory | Time | Notes |
Heapsort | Θ(1) | Θ(N log N) | Bad caching (61C) |
Insertion | Θ(1) | Θ(N2) | Θ(N) if almost sorted |
Mergesort | Θ(N) | Θ(N log N) | |
Random Quicksort | Θ(log N) expected | Θ(N log N) expected | Fastest sort |
Quicksort
Instructor Lab 10, CS61BL, SU23
Quicksort
Quicksort Performance
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Quicksort Flavors
We said Quicksort is the fastest, but this is only true if we make the right decisions about:
Suppose we run a speed test of Mergesort vs. Quicksort from last time, which had:
We’ll call this Quicksort L3S
Quicksort vs. Mergesort
Quicksort didn’t do so well!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints |
Mergesort | N/A | N/A | N/A | 1.3 seconds |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 3.2 seconds |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
Tony Hoare’s In-place Partitioning Scheme
Tony originally proposed a scheme where two pointers walk towards each other.
Full details here (note this is not the exact scheme Tony used): Demo
Using this partitioning scheme yields a very fast Quicksort.
Quicksort vs. Mergesort
Using Tony Hoare’s two pointer scheme, Quicksort is better than mergesort!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints |
Mergesort | N/A | N/A | N/A | 1.3 seconds |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 3.2 seconds |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 0.9 seconds |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
What If We Don’t Want Randomness?
Our approach so far: Use randomness to avoid worst case behavior, but some people don’t like having randomness in their sorting routine.
Another approach: Use the median (or an approximation) as our pivot.
Four philosophies:
1. Randomness: Pick a random pivot or shuffle before sorting.
2. Smarter pivot selection: Calculate or approximate the median.
3. Introspection: Switch to a safer sort if recursion goes to deep.
4. Try to cheat: If the array is already sorted, don’t sort (this doesn’t work).
This is what we’ve been using.
Philosophy 2a: Smarter Pivot Selection (linear time pivot pick)
The best possible pivot is the median.
Obvious approach: Just calculate the actual median and use that as pivot.
Goal: Come up with an algorithm for finding the median of an array. Bonus points if your algorithm takes linear time.
Your answer:
Philosophy 2a: Smarter Pivot Selection (linear time pivot pick)
Goal: Come up with an algorithm for finding the median of an array. Bonus points if your algorithm takes linear time.
Median Identification
Is it possible to find the median in Θ(N) time?
Historical note: The authors of this paper include FOUR Turing Award winners (and Vaughan Pratt)
Let’s see how Exact Median Quicksort performs.
Quicksort vs. Mergesort
Quicksort using PICK to find the exact median (Quicksort PickTH) is terrible!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints | Worst Case |
Mergesort | N/A | N/A | N/A | 1.3 seconds | Θ(N log N) |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 3.2 seconds | Θ(N2) |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 0.9 seconds | Θ(N2) |
Quicksort PickTH | Exact Median | Tony Hoare | Exact Median | 4.9 seconds | Θ(N log N) |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
Quicksort
Instructor Lab 10, CS61BL, SU23
Quicksort
Quicksort Performance
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
The Selection Problem
Computing the exact median would be great for picking an item to partition around. Gives us a “safe quick sort”.
However, it turns out that partitioning can be used to find the exact median.
Quick Select
Goal, find the median:
Partition, pivot lands at 2.
Now pivot lands at 6.
Partition right subproblem, median can’t be to the left!
Pivot lands at 4. Are we done?
550
14
10
330
5
4
817
913
14
10
330
550
5
4
817
913
6
5
9
550
14
10
330
817
913
9
550
14
6
10
5
330
817
913
14
10
330
5
4
4
4
5
10
14
330
5
4
4
4
5
Worst case performance?
What is the worst case performance for Quick Select? Give an array that causes this worst case (assuming we always pick leftmost item as pivot).
Worst case performance?
What is the worst case performance for Quick Select? Give an array that causes this worst case (assuming we always pick leftmost item as pivot).
Worst asymptotic performance Θ(N2) occurs if array is in sorted order.
[1 2 3 4 5 6 7 8 9 10 … N]
[1 2 3 4 5 6 7 8 9 10 … N]
[1 2 3 4 5 6 7 8 9 10 … N]
…
[1 2 3 4 5 … N/2 … N]
Expected Performance
On average, Quick Select will take Θ(N) time.
~N compares
~N/2 compares
~N/4 compares
On average, pivot ends up about halfway
N + N/2 + N/4 + … + 1 = Θ(N)
Quicksort With Quickselect?
Quicksort with PICK to find exact median was terrible.
What if we used Quickselect to find the exact median?
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints | Worst Case |
Mergesort | N/A | N/A | N/A | 1.3 seconds | Θ(N log N) |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 3.2 seconds | Θ(N2) |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 0.9 seconds | Θ(N2) |
Quicksort PickTH | Exact Median | Tony Hoare | Exact Median | 4.9 seconds | Θ(N log N) |
Stability
Instructor Lab 10, CS61BL, SU23
Quicksort
Quicksort Performance
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Sorting Summary (so far)
Listed by mechanism:
Listed by memory and runtime:
| Memory | # Compares | Notes |
Heapsort | Θ(1) | Θ(N log N) worst | Bad caching (61C) |
Insertion | Θ(1) | Θ(N2) worst | Θ(N) if almost sorted |
Mergesort | Θ(N) | Θ(N log N) worst | |
Random Quicksort | Θ(log N) (call stack) | Θ(N log N) expected | Fastest sort |
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Bas | 3 |
Jana | 3 |
Jouni | 3 |
Rosella | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Equivalent items don’t ‘cross over’ when being stably sorted.
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Jouni | 3 |
Rosella | 3 |
Bas | 3 |
Jana | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Sorting instability can be really annoying! Wanted students listed alphabetically by section.
Sorting Stability http://www.yellkey.com/popular
Is insertion sort stable?
Is Quicksort stable?
S O R T E X A M P L E
S O R T E X A M P L E (0 swaps)
O S R T E X A M P L E (1 swap )
O R S T E X A M P L E (1 swap )
O R S T E X A M P L E (0 swaps)
E O R S T X A M P L E (4 swaps)
E O R S T X A M P L E (0 swaps)
A E O R S T X M P L E (6 swaps)
A E M O R S T X P L E (5 swaps)
A E M O P R S T X L E (4 swaps)
A E L M O P R S T X E (7 swaps)
A E E L M O P R S T X (8 swaps)
6
8
3
1
2
7
4
Sorting Stability
Is insertion sort stable?
Is Quicksort stable?
S O R T E X A M P L E
S O R T E X A M P L E (0 swaps)
O S R T E X A M P L E (1 swap )
O R S T E X A M P L E (1 swap )
O R S T E X A M P L E (0 swaps)
E O R S T X A M P L E (4 swaps)
E O R S T X A M P L E (0 swaps)
A E O R S T X M P L E (6 swaps)
A E M O R S T X P L E (5 swaps)
A E M O P R S T X L E (4 swaps)
A E L M O P R S T X E (7 swaps)
A E E L M O P R S T X (8 swaps)
6
7
3
1
2
8
3
2
3
3
1
6
8
7
3
1
2
3
6
7
8
Hoare partitioning.
Three array partitioning.
Stability
| Memory | # Compares | Notes | Stable? |
Heapsort | Θ(1) | Θ(N log N) | Bad caching (61C) | No |
Insertion | Θ(1) | Θ(N2) | Θ(N) if almost sorted | Yes |
Mergesort | Θ(N) | Θ(N log N) | | Yes |
Quicksort LTHS | Θ(log N) | Θ(N log N) expected | Fastest sort | No |
You can create a stable Quicksort (i.e. the version from the previous lecture). However, unstable partitioning schemes (like Hoare partitioning) tend to be faster. All reasonable partitioning schemes yield Θ(N log N) expected runtime, but with different constants.
This is due to the cost of tracking recursive calls by the computer, and is also an “expected” amount. The difference between log N and constant memory is trivial.
Arrays.sort
In Java, Arrays.sort(someArray) uses:
Why? See A level problems.
Optimizing Sorts
Additional tricks we can play:
Sounds of Sorting
Instructor Lab 10, CS61BL, SU23
Quicksort
Quicksort Performance
Quicksort Variants
Quick Select (median finding)
Sorting Stability
Sounds of Sorting
Sounds of Sorting Algorithms (of 125 items)
Starts with selection sort: https://www.youtube.com/watch?v=kPRA0W1kECg
Insertion sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=0m9s
Quicksort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=0m38s
Mergesort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m05s
Heapsort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m28s
LSD sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m54s [coming tomorrow!]
MSD sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=2m10s [coming tomorrow!]
Shell’s sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=3m37s [bonus from last time]
Questions to ponder (later… after class):
Citations
Quickman from Mega Man 2