CS61B
Lecture 32: Sorting III
Partition Sort, a.k.a. Quicksort
Quicksorting N items: (Demo)
32
15
2
17
19
26
41
17
17
partition(32)
15
2
17
19
26
17
17
32
41
<= 32
>= 32
in its place
Run time is Θ(N log N) in the best case, Θ(N2) in the worst case, and Θ(N log N) in the average case.
Avoiding the Worst Case: Question from Last Time
If pivot always lands somewhere “good”, Quicksort is Θ(N log N). However, the very rare Θ(N2) cases do happen in practice, e.g.
What can we do to avoid worst case behavior?
Recall, our version of Quicksort has the following properties:
6
8
3
1
2
7
4
3
1
2
4
6
8
7
Avoiding the Worst Case: My Answers
What can we do to avoid running into the worst case for QuickSort?
Four philosophies:
1. Randomness: Pick a random pivot or shuffle before sorting.
2. Smarter pivot selection: Calculate or approximate the median.
3. Introspection: Switch to a safer sort if recursion goes to deep.
4. Preprocess the array: Could analyze array to see if Quicksort will be slow. No obvious way to do this, though (can’t just check if array is sorted, almost sorted arrays are almost slow).
Philosophy 1: Randomness (My Preferred Approach)
If pivot always lands somewhere “good”, Quicksort is Θ(N log N). However, the very rare Θ(N2) cases do happen in practice, e.g.
Dealing with bad ordering:
The second strategy requires care in partitioning code to avoid Θ(N2) behavior on arrays of duplicates.
Philosophy 2a: Smarter Pivot Selection (constant time pivot pick)
Randomness is necessary for best Quicksort performance! For any pivot selection procedure that is:
Dangerous input
7
2
3
4
6
1
8
5
The resulting Quicksort has a family of dangerous inputs that an adversary could easily generate.
Philosophy 2b: Smarter Pivot Selection (linear time pivot pick)
Could calculate the actual median in linear time.
Raises interesting question though: How do you compute the median of an array? Will talk about how to do this later today.
Philosophy 3: Introspection
Can also simply watch your recursion depth.
Perfectly reasonable approach, though not super common in practice.
Sorting Summary (so far)
Listed by mechanism:
Listed by memory and runtime:
| Memory | Time | Notes |
Heapsort | Θ(1) | Θ(N log N) | Bad caching (61C) |
Insertion | Θ(1) | Θ(N2) | Θ(N) if almost sorted |
Mergesort | Θ(N) | Θ(N log N) | |
Random Quicksort | Θ(log N) expected | Θ(N log N) expected | Fastest sort |
Quicksort Flavors
We said Quicksort is the fastest, but this is only true if we make the right decisions about:
Let’s speed test Mergesort vs. Quicksort from last time, which had:
We’ll call this QuicksortL3S
Quicksort vs. Mergesort
Quicksort didn’t do so well!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints |
Mergesort | N/A | N/A | N/A | 2.1 seconds |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 4.4 seconds |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
Tony Hoare’s In-place Partitioning Scheme
Tony originally proposed a scheme where two pointers walk towards each other.
Full details here: Demo
Using this partitioning scheme yields a very fast Quicksort.
Quicksort vs. Mergesort
Using Tony Hoare’s two pointer scheme, Quicksort is better than mergesort!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints |
Mergesort | N/A | N/A | N/A | 2.1 seconds |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 4.4 seconds |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 1.7 seconds |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
What If We Don’t Want Randomness?
Our approach so far: Use randomness to avoid worst case behavior, but some people don’t like having randomness in their sorting routine.
Another approach: Use the median (or an approximation) as our pivot.
Four philosophies:
1. Randomness: Pick a random pivot or shuffle before sorting.
2. Smarter pivot selection: Calculate or approximate the median.
3. Introspection: Switch to a safer sort if recursion goes to deep.
4. Try to cheat: If the array is already sorted, don’t sort (this doesn’t work).
This is what we’ve been using.
Philosophy 2a: Smarter Pivot Selection (linear time pivot pick)
The best possible pivot is the median.
Obvious approach: Just calculate the actual median and use that as pivot.
Goal: Come up with an algorithm for finding the median of an array. Bonus points if your algorithm takes linear time.
Your answer:
Philosophy 2a: Smarter Pivot Selection (linear time pivot pick)
Goal: Come up with an algorithm for finding the median of an array. Bonus points if your algorithm takes linear time.
Median Identification
Is it possible to find the median in Θ(N) time?
Historical note: The authors of this paper include FOUR Turing Award winners (and Vaughan Pratt)
Let’s see how Exact Median Quicksort performs.
Quicksort vs. Mergesort
Quicksort using PICK to find the exact median (Quicksort PickTH) is terrible!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints | Worst Case |
Mergesort | N/A | N/A | N/A | 2.1 seconds | Θ(N log N) |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 4.4 seconds | Θ(N2) |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 1.7 seconds | Θ(N2) |
Quicksort PickTH | Exact Median | Tony Hoare | Exact Median | 10.0 seconds | Θ(N log N) |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
Quick Select
The Selection Problem
Computing the exact median would be great for picking an item to partition around. Gives us a “safe quick sort”.
However, it turns out that partitioning can be used to find the exact median.
Quick Select
Goal, find the median:
Partition, pivot lands at 2.
Now pivot lands at 6.
Partition right subproblem, median can’t be to the left!
Pivot lands at 4. Are we done?
550
14
10
330
5
4
817
913
14
10
330
550
5
4
817
913
6
5
9
550
14
10
330
817
913
9
550
14
6
10
5
330
817
913
14
10
330
5
4
4
4
5
10
14
330
5
4
4
4
5
Worst case performance?
What is the worst case performance for Quick Select? Give an array that causes this worst case (assuming we always pick leftmost item as pivot).
Worst case performance?
What is the worst case performance for Quick Select? Give an array that causes this worst case (assuming we always pick leftmost item as pivot).
Worst asymptotic performance Θ(N2) occurs if array is in sorted order.
[1 2 3 4 5 6 7 8 9 10 … N]
[1 2 3 4 5 6 7 8 9 10 … N]
[1 2 3 4 5 6 7 8 9 10 … N]
…
[1 2 3 4 5 … N/2 … N]
Expected Performance
On average, Quick Select will take Θ(N) time.
~N compares
~N/2 compares
~N/4 compares
On average, pivot ends up about halfway
N + N/2 + N/4 + … + 1 = Θ(N)
Quicksort With Quickselect?
Quicksort with PICK to find exact median was terrible.
What if we used Quickselect to find the exact median?
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints | Worst Case |
Mergesort | N/A | N/A | N/A | 2.1 seconds | Θ(N log N) |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 4.4 seconds | Θ(N2) |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 1.7 seconds | Θ(N2) |
Quicksort PickTH | Exact Median | Tony Hoare | Exact Median | 10.0 seconds | Θ(N log N) |
Stability, Adaptiveness, Optimization
Sorting Summary (so far)
Listed by mechanism:
Listed by memory and runtime:
| Memory | # Compares | Notes |
Heapsort | Θ(1) | Θ(N log N) worst | Bad caching (61C) |
Insertion | Θ(1) | Θ(N2) worst | Θ(N) if almost sorted |
Mergesort | Θ(N) | Θ(N log N) worst | |
Random Quicksort | Θ(log N) (call stack) | Θ(N log N) expected | Fastest sort |
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Bas | 3 |
Jana | 3 |
Jouni | 3 |
Rosella | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Equivalent items don’t ‘cross over’ when being stably sorted.
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Jouni | 3 |
Rosella | 3 |
Bas | 3 |
Jana | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Sorting instability can be really annoying! Wanted students listed alphabetically by section.
Sorting Stability www.yellkey.com/result
Is insertion sort stable?
Is Quicksort stable?
S O R T E X A M P L E
S O R T E X A M P L E (0 swaps)
O S R T E X A M P L E (1 swap )
O R S T E X A M P L E (1 swap )
O R S T E X A M P L E (0 swaps)
E O R S T X A M P L E (4 swaps)
E O R S T X A M P L E (0 swaps)
A E O R S T X M P L E (6 swaps)
A E M O R S T X P L E (5 swaps)
A E M O P R S T X L E (4 swaps)
A E L M O P R S T X E (7 swaps)
A E E L M O P R S T X (8 swaps)
6
8
3
1
2
7
4
Sorting Stability
Is insertion sort stable?
Is Quicksort stable?
S O R T E X A M P L E
S O R T E X A M P L E (0 swaps)
O S R T E X A M P L E (1 swap )
O R S T E X A M P L E (1 swap )
O R S T E X A M P L E (0 swaps)
E O R S T X A M P L E (4 swaps)
E O R S T X A M P L E (0 swaps)
A E O R S T X M P L E (6 swaps)
A E M O R S T X P L E (5 swaps)
A E M O P R S T X L E (4 swaps)
A E L M O P R S T X E (7 swaps)
A E E L M O P R S T X (8 swaps)
6
7
3
1
2
8
3
2
3
3
1
6
8
7
3
1
2
3
6
7
8
Hoare partitioning.
Three array partitioning.
Stability
| Memory | # Compares | Notes | Stable? |
Heapsort | Θ(1) | Θ(N log N) | Bad caching (61C) | No |
Insertion | Θ(1) | Θ(N2) | Θ(N) if almost sorted | Yes |
Mergesort | Θ(N) | Θ(N log N) | | Yes |
Quicksort LTHS | Θ(log N) | Θ(N log N) expected | Fastest sort | No |
You can create a stable Quicksort (i.e. the version from the previous lecture). However, unstable partitioning schemes (like Hoare partitioning) tend to be faster. All reasonable partitioning schemes yield Θ(N log N) expected runtime, but with different constants.
This is due to the cost of tracking recursive calls by the computer, and is also an “expected” amount. The difference between log N and constant memory is trivial.
Optimizing Sorts
Additional tricks we can play:
Arrays.sort
In Java, Arrays.sort(someArray) uses:
Why? See A level problems.
Sounds of Sorting (Fun)
Sounds of Sorting Algorithms (of 125 items)
Starts with selection sort: https://www.youtube.com/watch?v=kPRA0W1kECg
Insertion sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=0m9s
Quicksort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=0m38s
Mergesort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m05s
Heapsort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m28s
LSD sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m54s [coming next Wednesday]
MSD sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=2m10s [coming next Wednesday]
Shell’s sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=3m37s [bonus from last time]
Questions to ponder (later… after class):
Citations
Sorting, Puppies, Cats, and Dogs
A solution to the sorting problem also provides a solution to puppy, cat, dog.
Physics analogy: Climbing a hill with your legs is one way to solve the problem of getting up a hill.