Lecture 13-15
Sorting: checksort, selection, insertion, merge, quick, compareTo
Sorting
Sorting
Performance Definitions
Check Sort
Check Sort
Bogosort
while not isInOrder(deck):
shuffle(deck)
reminder
Selection Sort
Selection Sort. Complexity
Complexity of selection sort (worst)?
A: Θ(Log N)
B: Θ(N)
C: Θ(N log N)
D: Θ(N2)
E: Other
Selection Sort. Complexity
Best case running time
for Selection sort?
A: Θ(1)
B: Θ(N)
C: Θ(log N)
D: Θ(N2)
E: None of the above
Costs
| Best Case Running time | Worst Case Running time | Additional Space |
Selection Sort | Θ(N2) | Θ(N2) | Θ(1) |
Insertion Sort
Insertion Sort
General strategy:
Demo (Link)
Invariant: Partly Sorted
Insertion Sort
Complexity of insertion sort (worst)?
A: Θ(Log N)
B: Θ(N)
C: Θ(N log N)
D: Θ(N2)
E: Other
Insertion Sort
Costs
| Best Case Running time | Worst Case Running time | Additional Space |
Selection Sort | Θ(N2) | Θ(N2) | Θ(1) - in place |
Insertion Sort | Θ(N) | Θ(N2) | Θ(1) - in place |
Merge Sort
MergeSort
Mergesort: Demo
Array Merging of N Total Items (A.length + B.length = N)
What is the runtime for merge step?
A: Θ(1) D: Θ(N log N)
B: Θ(N) E: None of the above
C: Θ(N2)
2 | 3 | 6 | 10 | 11 |
4 | 5 | 7 | 8 |
2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 |
A
B
Reminder
Lost in Recursion Land
Merge Sort: More General
Intuitive explanation:
N
N/2
N/2
N/4
N/4
N/4
….
N/8
N/8
….
N/4
N/8
N/8
k
MergeSort
Mergesort: Demo
Space complexity with aux array: Costs Θ(N) memory.
Also possible to do in-place merge sort, but algorithm is very complicated, and runtime performance suffers by a significant constant factor.
Which input wins for merge sort?
A: Random
B: (Nearly) Sorted
C: Duplicates
D: Reversed
E: Does not matter
https://www.toptal.com/developers/sorting-algorithms
If time permits
https://www.youtube.com/watch?v=XaqR3G_NVoo
https://www.youtube.com/watch?v=semGJAJ7i74&list=PLOmdoKois7_FK-ySGwHBkltzB11snW7KQ
Costs
| Best Case Running time | Worst Case Running time | Additional Space |
Selection Sort | Θ(N2) | Θ(N2) | Θ(1) - in place |
Insertion Sort | Θ(N) | Θ(N2) | Θ(1) - in place |
Merge Sort | Θ(N Log N) | Θ(N Log N) | Θ(N) - not in place |
Quick Sort
The Core Idea: Partitioning
To partition an array a[] on element x=a[i] is to rearrange a[] so that:
E. More than one Which partitions are valid?
5
550
10
4
10
9
330
i
4
5
9
10
10
330
550
5
9
10
4
10
550
330
5
4
9
10
10
550
330
5
9
10
4
10
550
330
A.
C.
B.
D.
j
j
j
j
Called the ‘pivot’.
Interview Question (Partitioning)
Given an array of colors where the 0th element is white, and the remaining elements are red (less) or blue (greater), rearrange the array so that all red squares are to the left of the white square, and all blue squares are to the right. Your algorithm must complete in Θ(N) time (no space restriction).
3
1
2
4
6
8
7
3
4
1
2
6
7
8
Example of a valid output
Another example of a valid output
6
8
3
1
2
7
4
Input
Simplest (but not fastest) Answer: 3 Scan Approach
Given an array of colors where the 0th element is white, and the remaining elements are red (less) or blue (greater), rearrange the array so that all red squares are to the left of the white square, and all blue squares are to the right. Your algorithm must complete in Θ(N) time (no space restriction).
Algorithm: Create another array. Scan and copy all the red items to the first R spaces. Then scan for and copy the white item. Then scan and copy the blue items to the last B spaces.
6
8
3
1
2
7
4
Input
Output
3
1
2
4
6
8
7
Partition Sort, a.k.a. Quicksort
Observations:
5
3
2
1
8
4
6
7
3
2
1
4
7
8
6
5
3
2
1
4
5
7
8
6
5
2
1
3
4
5
6
7
8
5
Q: How would we use this operation for sorting?
Partition Sort, a.k.a. Quicksort
Quicksorting N items: (Demo)
32
15
2
17
19
26
41
17
17
partition(32)
15
2
17
19
26
17
17
32
41
<= 32
>= 32
in its place
Using a good pivot
Using a good pivot
reminder
Record
Using a good pivot
Question
Which of these choices would be the worst choice for the pivot?
Quick sort with a bad pivot
Question
Which of these choices is a better choice for the pivot?
Costs
| Best Case Running time | Worst Case Running time | Additional Space |
Selection Sort | Θ(N2) | Θ(N2) | Θ(1) - in place |
Insertion Sort | Θ(N) | Θ(N2) | Θ(1) - in place |
Merge Sort | Θ(N Log N) | Θ(N Log N) | Θ(N) |
Quick Sort | Θ(N Log N) | Θ(N2) | Θ(log n) |
Stability
Other Desirable Sorting Property: Stability
A sort is said to be stable if the order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Bas | 3 |
Jana | 3 |
Jouni | 3 |
Rosella | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Equivalent items don’t ‘cross over’ when being stably sorted.
Other Desirable Sorting Property: Stability
A sort is said to be stable if order of equivalent items is preserved.
Bas | 3 |
Fikriyya | 4 |
Jana | 3 |
Jouni | 3 |
Lara | 1 |
Nikolaj | 4 |
Rosella | 3 |
Sigurd | 2 |
sort(studentRecords, BY_NAME);
Lara | 1 |
Sigurd | 2 |
Jouni | 3 |
Rosella | 3 |
Bas | 3 |
Jana | 3 |
Fikriyya | 4 |
Nikolaj | 4 |
sort(studentRecords, BY_SECTION);
Sorting instability can be really annoying! Wanted students listed alphabetically by section.
Is Selection Sort Stable? Sort to find out!
Is Insertion Sort Stable? Sort to find out!
Is Merge Sort Stable? Sort to find out!
Suppose we do the following:
Which sorting algorithm would be the fastest choice for X?
Costs
| Best Case Running time | Worst Case Running time | Additional Space | Stability |
Selection Sort | Θ(N2) | Θ(N2) | Θ(1) - in place | No |
Insertion Sort | Θ(N) | Θ(N2) | Θ(1) - in place | Yes |
Merge Sort | Θ(N Log N) | Θ(N Log N) | Θ(N) | Yes |
Random Quick Sort | Θ(N Log N) And average | Θ(N2) | Θ(log n) | No |
Questions?
Empirical Testing
Practical world
Explain these results
Timing for findMax() on an array of ints and an array of Integer objects (times are in milliseconds)
n | Array of int | Array of Integer |
800,000 | 2.314 | 4.329 |
4,000,000 | 11.363 | 21.739 |
8,000,000 | 22.727 | 42.958 |
For these measurements, what is the likely ϴ time cost of findMax() on array of int
A: f(n) = ϴ (log2n) D: f(n) = ϴ (n)
B: f(n) = ϴ (n log2n) E: None of these
C: f(n) = ϴ (n2)
Explain these results
Timing for findMax() on an array of ints and an array of Integers (times are in milliseconds)
n | Array of int | Array of Integer |
800,000 | 2.314 | 4.329 |
4,000,000 | 11.363 | 21.739 |
8,000,000 | 22.727 | 42.958 |
For these measurements, what is the likely ϴ time cost of findMax() on array of Integer
A: f(n) = ϴ (log2n) D: f(n) = ϴ (n)
B: f(n) = ϴ (n log2n) E: None of these
C: f(n) = ϴ (n2)
Explain these results
Timing for findMax() on an array of ints and an array of Integer (times are in milliseconds)
n | Array of int | Array of Integer |
800,000 | 2.314 | 4.329 |
4,000,000 | 11.363 | 21.739 |
8,000,000 | 22.727 | 42.958 |
Discussion
Reasons to use “real timers”
Procedure for measuring time cost
Procedure for measuring time cost
int[ ] N = { 1000, 2000, 3000, ..., 10000 };
Basic procedure for benchmarking
Timer methods in Java
Draft 1
Is it enough to execute once?
Draft 2
A much-improved procedure for measuring the
time cost of A(X) is to compute the average time
across Tries attempts.
IMPROVED procedure for benchmarking