Announcements
Project 3 autograder has been updated to do randomized testing.
More people are failing the 3x test than we expected, will investigate.
Extra deadline for part 1 is today.
CS61B
Lecture 34: Sorting III
Sorting Summary (so far)
Listed by mechanism:
Listed by memory and runtime:
| Memory | Time | Notes |
Heapsort | Θ(1) | Θ(N log N) | Bad caching (61C) |
Insertion | Θ(1) | Θ(N2) | Θ(N) if almost sorted |
Mergesort | Θ(N) | Θ(N log N) | |
Random Quicksort | Θ(log N) expected | Θ(N log N) expected | Fastest sort |
Quicksort Flavors
We said Quicksort is the fastest, but this is only true if we make the right decisions about:
Let’s speed test Mergesort vs. Quicksort from last time, which had:
We’ll call this QuicksortL3S
Quicksort vs. Mergesort
Quicksort didn’t do so well!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints |
Mergesort | N/A | N/A | N/A | 2.1 seconds |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 4.4 seconds |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
Tony Hoare’s In-place Partitioning Scheme
Tony originally proposed a scheme where two pointers walk towards each other.
Full details here: Demo
Using this partitioning scheme yields a very fast Quicksort.
Quicksort vs. Mergesort
Using Tony Hoare’s two pointer scheme, Quicksort is better than mergesort!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints |
Mergesort | N/A | N/A | N/A | 2.1 seconds |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 4.4 seconds |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 1.7 seconds |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
What If We Don’t Want Randomness?
Our approach so far: Use randomness to avoid worst case behavior, but some people don’t like having randomness in their sorting routine.
Another approach: Use the median (or an approximation) as our pivot.
Four philosophies:
1. Randomness: Pick a random pivot or shuffle before sorting.
2. Smarter pivot selection: Calculate or approximate the median.
3. Introspection: Switch to a safer sort if recursion goes to deep.
4. Try to cheat: If the array is already sorted, don’t sort (this doesn’t work).
This is what we’ve been using.
Philosophy 2a: Smarter Pivot Selection (linear time pivot pick)
The best possible pivot is the median.
Obvious approach: Just calculate the actual median and use that as pivot.
Goal: Come up with an algorithm for finding the median of an array. Bonus points if your algorithm takes linear time.
Philosophy 2a: Smarter Pivot Selection (linear time pivot pick)
The best possible pivot is the median.
Obvious approach: Just calculate the actual median and use that as pivot.
Goal: Come up with an algorithm for finding the median of an array. Bonus points if your algorithm takes linear time.
Your answer:
Median Identification
Is it possible to find the median in Θ(N) time?
Historical note: The authors of this paper include FOUR Turing Award winners (and Pratt is no slouch!)
Let’s see how Exact Median Quicksort performs.
Quicksort vs. Mergesort
Quicksort using PICK to find the exact median (Quicksort PickTH) is terrible!
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints | Worst Case |
Mergesort | N/A | N/A | N/A | 2.1 seconds | Θ(N log N) |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 4.4 seconds | Θ(N2) |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 1.7 seconds | Θ(N2) |
Quicksort PickTH | Exact Median | Tony Hoare | Exact Median | 10.0 seconds | Θ(N log N) |
Note: These are unoptimized versions of mergesort and quicksort, i.e. no switching to insertion sort for small arrays.
Quick Select
The Selection Problem
Computing the exact median would be great for picking an item to partition around. Gives us a “safe quick sort”.
However, it turns out that partitioning can be used to find the exact median.
Quick Select
Goal, find the median:
Partition, pivot lands at 2.
Now pivot lands at 6.
Partition right subproblem, median can’t be to the left!
Pivot lands at 4. Are we done?
550
14
10
330
5
4
817
913
14
10
330
550
5
4
817
913
6
5
9
550
14
10
330
817
913
9
550
14
6
10
5
330
817
913
14
10
330
5
4
4
4
5
10
14
330
5
4
4
4
5
Worst case performance?
What is the worst case performance for Quick Select? Give an array that causes this worst case (assuming we always pick leftmost item as pivot).
Worst case performance?
What is the worst case performance for Quick Select? Give an array that causes this worst case (assuming we always pick leftmost item as pivot).
Worst asymptotic performance Θ(N2) occurs if array is in sorted order.
[1 2 3 4 5 6 7 8 9 10 … N]
[1 2 3 4 5 6 7 8 9 10 … N]
[1 2 3 4 5 6 7 8 9 10 … N]
…
[1 2 3 4 5 … N/2 … N]
Expected Performance
On average, Quick Select will take Θ(N) time.
~N compares
~N/2 compares
~N/4 compares
On average, pivot ends up about halfway
N + N/2 + N/4 + … + 1 = Θ(N)
Quicksort With Quickselect?
Quicksort with PICK to find exact median was terrible.
What if we used Quickselect to find the exact median?
| Pivot Selection Strategy | Partition Algorithm | Worst Case Avoidance Strategy | Time to sort 1000 arrays of 10000 ints | Worst Case |
Mergesort | N/A | N/A | N/A | 2.1 seconds | Θ(N log N) |
Quicksort L3S | Leftmost | 3-scan | Shuffle | 4.4 seconds | Θ(N2) |
Quicksort LTHS | Leftmost | Tony Hoare | Shuffle | 1.7 seconds | Θ(N2) |
Quicksort PickTH | Exact Median | Tony Hoare | Exact Median | 10.0 seconds | Θ(N log N) |
Stability, Adaptiveness, Optimization
Sorting Summary (so far)
Listed by mechanism:
Listed by memory and runtime:
| Memory | # Compares | Notes |
Heapsort | Θ(1) | Θ(N log N) worst | Bad caching (61C) |
Insertion | Θ(1) | Θ(N2) worst | Θ(N) if almost sorted |
Mergesort | Θ(N) | Θ(N log N) worst | |
Random Quicksort | Θ(log N) (call stack) | Θ(N log N) expected | Fastest sort |
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Bas | 3 |
Jana | 3 |
Jouni | 3 |
Rosella | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Equivalent items don’t ‘cross over’ when being stably sorted.
Other Desirable Sorting Properties: Stability
A sort is said to be stable if order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Jouni | 3 |
Rosella | 3 |
Bas | 3 |
Jana | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Sorting instability can be really annoying! Wanted students listed alphabetically by section.
Sorting Stability
Is insertion sort stable?
Is Quicksort stable?
S O R T E X A M P L E
S O R T E X A M P L E (0 swaps)
O S R T E X A M P L E (1 swap )
O R S T E X A M P L E (1 swap )
O R S T E X A M P L E (0 swaps)
E O R S T X A M P L E (4 swaps)
E O R S T X A M P L E (0 swaps)
A E O R S T X M P L E (6 swaps)
A E M O R S T X P L E (5 swaps)
A E M O P R S T X L E (4 swaps)
A E L M O P R S T X E (7 swaps)
A E E L M O P R S T X (8 swaps)
6
8
3
1
2
7
4
Sorting Stability
Is insertion sort stable?
Is Quicksort stable?
S O R T E X A M P L E
S O R T E X A M P L E (0 swaps)
O S R T E X A M P L E (1 swap )
O R S T E X A M P L E (1 swap )
O R S T E X A M P L E (0 swaps)
E O R S T X A M P L E (4 swaps)
E O R S T X A M P L E (0 swaps)
A E O R S T X M P L E (6 swaps)
A E M O R S T X P L E (5 swaps)
A E M O P R S T X L E (4 swaps)
A E L M O P R S T X E (7 swaps)
A E E L M O P R S T X (8 swaps)
6
7
3
1
2
8
3
2
3
3
1
6
8
7
3
1
2
3
6
7
8
Hoare partitioning.
Three array partitioning.
Stability
| Memory | # Compares | Notes | Stable? |
Heapsort | Θ(1) | Θ(N log N) | Bad caching (61C) | No |
Insertion | Θ(1) | Θ(N2) | Θ(N) if almost sorted | Yes |
Mergesort | Θ(N) | Θ(N log N) | | Yes |
Quicksort LTHS | Θ(log N) | Θ(N log N) expected | Fastest sort | No |
You can create a stable Quicksort. However, unstable partitioning schemes (like Hoare partitioning) tend to be faster. All reasonable partitioning schemes yield Θ(N log N) expected runtime, but with different constants.
This is due to the cost of tracking recursive calls by the computer, and is also an “expected” amount. The difference between log N and constant memory is trivial.
Optimizing Sorts
Additional tricks we can play:
Arrays.sort
In Java, Arrays.sort(someArray) uses:
Why? See A level problems.
Shuffling
Shuffling
How do we shuffle an array?
Shuffling
How do we shuffle an array?
0.8741
0.5512
0.211
0.366
Shuffling
How do we shuffle an array?
How do we get random numbers? See Fa14 lecture 33.
0.8741
0.5512
0.211
0.366
How do we NOT shuffle?
In response to anti-trust investigation, Microsoft agreed to provide a random browser selection website.
Appeared last 50% of the time.
The (Bad) Randomization Approach
Implementation: Make compareTo() return a random answer.
Appeared last 50% of the time.
The (Bad) Randomization Approach
Implementation: Make compareTo() return a random answer.
Simple programming error… or was it?
Appeared last 50% of the time.
Sounds of Sorting (Fun)
Sounds of Sorting Algorithms (of 125 items)
Starts with selection sort: https://www.youtube.com/watch?v=kPRA0W1kECg
Insertion sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=0m9s
Quicksort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=0m38s
Mergesort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m05s
Heapsort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m28s
LSD sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=1m54s [coming next Wednesday]
MSD sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=2m10s [coming next Wednesday]
Shell’s sort: https://www.youtube.com/watch?v=kPRA0W1kECg&t=3m37s [bonus from last time]
Questions to ponder (later… after class):
Citations
Sorting, Puppies, Cats, and Dogs
A solution to the sorting problem also provides a solution to puppy, cat, dog.
Physics analogy: Climbing a hill with your legs is one way to solve the problem of getting up a hill.