Announcements
Proj3: It’s out. TIme to get started.
Midterm 2 grades will be out late Tuesday/early Wednesday.
CS61B
Lecture 32: Sorting I
Sorting - Definitions
A sort is a permutation (re-arrangement) of a sequence of elements that brings them into order according to some total order. A total order ≼ is:
Unequal items may be ‘equivalent’ under these definitions:
Sorting: An Alternate Viewpoint
An inversion is a pair of elements that are out of order.
Goal of sorting:
0 1 1 2 3 4 8 6 9 5 7
8-6 8-5 8-7 6-5 9-5 9-7
(6 inversions out of 55 max)
Performance Definitions
Characterizations of the runtime efficiency are sometimes called the time complexity of an algorithm. Examples:
Characterizations of the “extra” memory usage of an algorithm is sometimes called the space complexity of an algorithm.
Selection Sort and Heapsort
Selection Sort
We’ve seen this already.
Sort Properties:
Seems inefficient: We look through entire remaining array every time to find the minimum.
Naive Heapsort: Leveraging a Max-Oriented Heap
Idea: Instead of rescanning entire array looking for minimum, maintain a heap so that getting the minimum is fast!
For reasons that will become clear soon, we’ll use a max-oriented heap.
Naive heapsorting N items:
Naive Heapsort Demo: https://goo.gl/EZWwSJ
A min heap would work as well, but wouldn’t be able to take advantage of the fancy trick in a few slides.
Naive Heapsort Runtime: http://shoutkey.com/any
Naive heapsorting N items:
What is the TOTAL runtime of naive heapsort?
Heapsort Runtime Analysis
Use the magic of the heap to sort our data.
Informally, overall runtime is Θ(N Log N) + Θ(N) + O(N log N) = Θ(N Log N)
Memory usage is Θ(N) to build the additional copy of all of our data.
If every swim fails…. Then it’d be theta(N) in that case. So maybe best to say that this is O(N log N).
A thing to ponder. I’ll put on the study guide with answers.
In-place Heapsort
Alternate approach, treat input array as a heap!
In-place heap sort: Demo
32
15
2
17
19
26
41
17
17
41
19
32
17
15
26
2
17
17
heapification
For this algorithm we don’t leave spot 0 blank.
In-place Heapsort Runtime: http://shoutkey.com/salad
Use the magic of the heap to sort our data.
Give the time complexity of in-place heapsort in big O notation.
In-place Heapsort Runtime
Use the magic of the heap to sort our data.
Give the time complexity of in-place heapsort in big O notation.
Bottom-up heapification is N sink operations, each taking no more than O(log N) time, so overall runtime for heapification is O(N log N).
In-place Heapsort: http://shoutkey.com/scream
What is the memory complexity of Heapsort?
In-place Heapsort
What is the memory complexity of Heapsort?
The only extra memory we need is a constant number instance variables, e.g. size.
Sorts So Far
| Best Case Runtime | Worst Case Runtime | Space | Demo | Notes |
Θ(N2) | Θ(N2) | Θ(1) | | ||
Heapsort (in place) | Θ(N)* | Θ(N log N) | Θ(1) | Bad cache (61C) performance. |
*: An array of all duplicates yields linear runtime for heapsort.
Mergesort
Mergesort
We’ve seen this one before as well.
Mergesort: Demo
Time complexity, analysis from previous lecture: Θ(N log N runtime)
Also possible to do in-place merge sort, but algorithm is very complicated, and runtime performance suffers by a significant constant factor.
Sorts So Far
| Best Case Runtime | Worst Case Runtime | Space | Demo | Notes |
Θ(N2) | Θ(N2) | Θ(1) | | ||
Heapsort (in place) | Θ(N)* | Θ(N log N) | Θ(1) | Bad cache (61C) performance. | |
Θ(N log N) | Θ(N log N) | Θ(N) | Faster than heap sort. |
*: An array of all duplicates yields linear runtime for heapsort.
Insertion Sort
Insertion Sort
General strategy:
Naive approach, build entirely new output: Demo (Link)
32
15
2
17
19
26
41
17
17
Input:
Output:
Insertion Sort
General strategy:
Naive approach, build entirely new output: Demo (Link)
32
15
2
17
19
26
41
17
17
Input:
Output:
32
15
2
17
19
26
41
17
17
Insertion Sort
General strategy:
For naive approach, if output sequence contains k items, worst cost to insert a single item is k.
More efficient method:
In-place Insertion Sort
Two more examples.
7 swaps:
36 swaps:
P O T A T O
P O T A T O (0 swaps)
O P T A T O (1 swap )
O P T A T O (0 swaps)
A O P T T O (3 swaps)
A O P T T O (0 swaps)
A O O P T T (3 swaps)
S O R T E X A M P L E
S O R T E X A M P L E (0 swaps)
O S R T E X A M P L E (1 swap )
O R S T E X A M P L E (1 swap )
O R S T E X A M P L E (0 swaps)
E O R S T X A M P L E (4 swaps)
E O R S T X A M P L E (0 swaps)
A E O R S T X M P L E (6 swaps)
A E M O R S T X P L E (5 swaps)
A E M O P R S T X L E (4 swaps)
A E L M O P R S T X E (7 swaps)
A E E L M O P R S T X (8 swaps)
Purple: Element that we’re moving left (with swaps).
Black: Elements that got swapped with red.
Grey: Not considered this iteration.
Insertion Sort Runtime: http://shoutkey.com/drink
What is the runtime of insertion sort?
36 swaps:
Picking the Best Sort: http://shoutkey.com/kiwi
Suppose we do the following:
Which sorting algorithm would be the fastest choice for X?
(Expected stop here unless we have extra time)
To Be Continued
(slides that follow this point are for next time)
Observation: Insertion Sort on Almost Sorted Arrays
For arrays that are almost sorted, insertion sort does very little work.
Picking the Best Sort (Poll Everywhere)
Suppose we do the following:
Which sorting algorithm would be the fastest choice for X? Worst case run-times:
Insertion Sort Sweet Spots
On arrays with a small number of inversions, insertion sort is extremely fast.
Less obvious: For small arrays (N < 15 or so), insertion sort is fastest.
Sorts So Far
| Best Case Runtime | Worst Case Runtime | Space | Demo | Notes |
Θ(N2) | Θ(N2) | Θ(1) | | ||
Heapsort (in place) | Θ(N)* | Θ(N log N) | Θ(1) | Bad cache (61C) performance. | |
Θ(N log N) | Θ(N log N) | Θ(N) | Fastest of these. | ||
Insertion Sort (in-place) | Θ(N) | Θ(N2) | Θ(1) | Best for small N or almost sorted. |
An earlier version of this slide assumed that Heapsort was always given an array with no duplicate elements. If we omit this assumption, heapsort’s best case is Θ(N).
Shell’s Sort (Extra)
(Not on Exam)
Optimizing Insertion Sort: Shell’s Sort
Big idea: Fix multiple inversions at once.
Optimizing Insertion Sort: Shell’s Sort
7-sorting:
3-sorting:
3 swaps
5 swaps
1-sorting:
6 swaps
Total swaps: 14 (vs. 31)
S O R T E X A M P L E
M O R T E X A S P L E (1 swap, > 1 inversion)
M O R T E X A S P L E (0 swaps)
M O L T E X A S P R E (1 swap, > 1 inversion)
M O L E E X A S P R T (1 swap, > 1 inversion)
M O L E E X A S P R T
E O L M E X A S P R T (1 swap, >1 inversions)
E E L M O X A S P R T (1 swap, >1 inversions)
E E L M O X A S P R T (0 swaps, 0 inversions)
A E L E O X M S P R T (2 swaps, >1 inversions)
A E L E O X M S P R T (0 swaps, 0 inversions)
A E L E O P M S X R T (1 swaps, >1 inversions)
A E L E O P M S X R T (0 swaps, 0 inversions)
A E L E O P M S X R T (0 swaps, 0 inversions)
Example from Algorithms 4th Edition
A E L E O P M S X R T
A E L E O P M S X R T
A E L E O P M S X R T
A E L E O P M S X R T
A E E L O P M S X R T
A E E L O P M S X R T
A E E L O P M S X R T
A E E L M O P S X R T
A E E L M O P S X R T
A E E L M O P S X R T
A E E L M O P R S X T
A E E L M O P R S T X
h=1 is just insertion sort.
Shell’s Sort: Generalization and Performance
h=1 is just normal insertion sort.
We used 7, 3, 1. Can generalize to 2k - 1 from some k down to 1.
Sorts So Far
| Best Case Runtime | Worst Case Runtime | Space | Demo | Notes |
Θ(N2) | Θ(N2) | Θ(1) | | ||
Heapsort (in place) | Θ(N)* | Θ(N log N) | Θ(1) | Bad cache (61C) performance. | |
Θ(N log N) | Θ(N log N) | Θ(N) | Fastest of these. | ||
Insertion Sort (in-place) | Θ(N) | Θ(N2) | Θ(1) | Best for small N or almost sorted. | |
Θ(N) | Ω(N log N), O(?) | Θ(1) | N/A | Rich theory! |
An earlier version of this slide assumed that Heapsort was always given an array with no duplicate elements. If we omit this assumption, heapsort’s best case is Θ(N).
Citations
Lego man: http://thetechnicgear.com/wp-content/uploads/2014/02/sorting-lego.jpg
Keynote Animations of Algorithms courtesy of Kevin Wayne