1 of 75

Lecture 13-15

Sorting: checksort, selection, insertion, merge, quick, compareTo

2 of 75

Sorting

3 of 75

Sorting

  • Sorting takes an unordered collection and makes it an ordered one.

4 of 75

Performance Definitions

  • Time complexity

  • Space complexity
    • the “extra” memory usage of an algorithm

5 of 75

Check Sort

6 of 75

Check Sort

7 of 75

Bogosort

while not isInOrder(deck):

shuffle(deck)

8 of 75

reminder

  • mic

9 of 75

Selection Sort

10 of 75

11 of 75

Selection Sort. Complexity

Complexity of selection sort (worst)?

A: Θ(Log N)

B: Θ(N)

C: Θ(N log N)

D: Θ(N2)

E: Other

12 of 75

Selection Sort. Complexity

13 of 75

Best case running time

for Selection sort?

A: Θ(1)

B: Θ(N)

C: Θ(log N)

D: Θ(N2)

E: None of the above

14 of 75

Costs

Best Case Running time

Worst Case Running time

Additional Space

Selection Sort

Θ(N2)

Θ(N2)

Θ(1)

15 of 75

Insertion Sort

16 of 75

Insertion Sort

General strategy:

  • Starting with an empty output sequence.
  • Add each item from input, inserting into output at right point.

Demo (Link)

17 of 75

Invariant: Partly Sorted

18 of 75

Insertion Sort

Complexity of insertion sort (worst)?

A: Θ(Log N)

B: Θ(N)

C: Θ(N log N)

D: Θ(N2)

E: Other

19 of 75

20 of 75

21 of 75

Insertion Sort

  • In most cases the insertion sort is the best of the elementary sorts.
  • It still executes in Θ(N2) time, but it’s about twice as fast as the bubble sort and somewhat faster than the selection sort in normal situations.
  • It’s also not too complex, although it’s slightly more involved than the bubble and selection sorts.
  • It’s often used as the final stage of more sophisticated sorts, such as quicksort.
  • Works the best if the data is sorted or partly sorted.

22 of 75

Costs

Best Case Running time

Worst Case Running time

Additional Space

Selection Sort

Θ(N2)

Θ(N2)

Θ(1) - in place

Insertion Sort

Θ(N)

Θ(N2)

Θ(1) - in place

23 of 75

24 of 75

25 of 75

26 of 75

Merge Sort

27 of 75

28 of 75

MergeSort

Mergesort: Demo

  • Split items into 2 roughly even pieces.
  • Mergesort each half (steps not shown, this is a recursive algorithm!)
  • Merge the two sorted halves to form the final result.

29 of 75

Array Merging of N Total Items (A.length + B.length = N)

What is the runtime for merge step?

A: Θ(1) D: Θ(N log N)

B: Θ(N) E: None of the above

C: Θ(N2)

2

3

6

10

11

4

5

7

8

2

3

4

5

6

7

8

10

11

A

B

30 of 75

Reminder

  • mic

31 of 75

32 of 75

Lost in Recursion Land

33 of 75

34 of 75

Merge Sort: More General

Intuitive explanation:

  • Every level does N work
    • Top level does N work.
    • Next level does N/2 + N/2 = N.
    • One more level down: N/4 + N/4 + N/4 + N/4 = N.
  • Thus work is just Nk, where k is the number of levels.
    • How many levels? Goes until we get to size 1.
    • k = lg(N)
  • Overall runtime is N log N.

N

N/2

N/2

N/4

N/4

N/4

….

N/8

N/8

….

N/4

N/8

N/8

k

35 of 75

MergeSort

Mergesort: Demo

  • Split items into 2 roughly even pieces.
  • Mergesort each half (steps not shown, this is a recursive algorithm!)
  • Merge the two sorted halves to form the final result.

Space complexity with aux array: Costs Θ(N) memory.

Also possible to do in-place merge sort, but algorithm is very complicated, and runtime performance suffers by a significant constant factor.

36 of 75

Which input wins for merge sort?

A: Random

B: (Nearly) Sorted

C: Duplicates

D: Reversed

E: Does not matter

https://www.toptal.com/developers/sorting-algorithms

37 of 75

If time permits

https://www.youtube.com/watch?v=XaqR3G_NVoo

https://www.youtube.com/watch?v=semGJAJ7i74&list=PLOmdoKois7_FK-ySGwHBkltzB11snW7KQ

38 of 75

Costs

Best Case Running time

Worst Case Running time

Additional Space

Selection Sort

Θ(N2)

Θ(N2)

Θ(1) - in place

Insertion Sort

Θ(N)

Θ(N2)

Θ(1) - in place

Merge Sort

Θ(N Log N)

Θ(N Log N)

Θ(N) - not in place

39 of 75

Quick Sort

40 of 75

The Core Idea: Partitioning

To partition an array a[] on element x=a[i] is to rearrange a[] so that:

  • x moves to position j (may be the same as i)
  • All entries to the left of x are <= x.
  • All entries to the right of x are >= x.

E. More than one Which partitions are valid?

5

550

10

4

10

9

330

i

4

5

9

10

10

330

550

5

9

10

4

10

550

330

5

4

9

10

10

550

330

5

9

10

4

10

550

330

A.

C.

B.

D.

j

j

j

j

Called the ‘pivot’.

41 of 75

Interview Question (Partitioning)

Given an array of colors where the 0th element is white, and the remaining elements are red (less) or blue (greater), rearrange the array so that all red squares are to the left of the white square, and all blue squares are to the right. Your algorithm must complete in Θ(N) time (no space restriction).

  • Relative order of red and blues does NOT need to stay the same.

3

1

2

4

6

8

7

3

4

1

2

6

7

8

Example of a valid output

Another example of a valid output

6

8

3

1

2

7

4

Input

42 of 75

Simplest (but not fastest) Answer: 3 Scan Approach

Given an array of colors where the 0th element is white, and the remaining elements are red (less) or blue (greater), rearrange the array so that all red squares are to the left of the white square, and all blue squares are to the right. Your algorithm must complete in Θ(N) time (no space restriction).

  • Relative order of red and blues does NOT need to stay the same.

Algorithm: Create another array. Scan and copy all the red items to the first R spaces. Then scan for and copy the white item. Then scan and copy the blue items to the last B spaces.

6

8

3

1

2

7

4

Input

Output

3

1

2

4

6

8

7

43 of 75

Partition Sort, a.k.a. Quicksort

Observations:

  • 5 is “in its place.” Exactly where it’d be if the array were sorted.
  • Can sort two halves separately, e.g. through recursive use of partitioning.

5

3

2

1

8

4

6

7

3

2

1

4

7

8

6

5

3

2

1

4

5

7

8

6

5

2

1

3

4

5

6

7

8

5

Q: How would we use this operation for sorting?

44 of 75

Partition Sort, a.k.a. Quicksort

Quicksorting N items: (Demo)

  • Partition on leftmost item.
  • Quicksort left half.
  • Quicksort right half.

32

15

2

17

19

26

41

17

17

partition(32)

15

2

17

19

26

17

17

32

41

<= 32

>= 32

in its place

45 of 75

Using a good pivot

46 of 75

Using a good pivot

47 of 75

reminder

Record

48 of 75

Using a good pivot

49 of 75

Question

Which of these choices would be the worst choice for the pivot?

50 of 75

Quick sort with a bad pivot

51 of 75

Question

Which of these choices is a better choice for the pivot?

52 of 75

Costs

Best Case Running time

Worst Case Running time

Additional Space

Selection Sort

Θ(N2)

Θ(N2)

Θ(1) - in place

Insertion Sort

Θ(N)

Θ(N2)

Θ(1) - in place

Merge Sort

Θ(N Log N)

Θ(N Log N)

Θ(N)

Quick Sort

Θ(N Log N)

Θ(N2)

Θ(log n)

53 of 75

Stability

54 of 75

Other Desirable Sorting Property: Stability

A sort is said to be stable if the order of equivalent items is preserved.

Bas

3

Fikriyya

4

Jana

3

Jouni

3

Lara

1

Nikolaj

4

Rosella

3

Sigurd

2

sort(studentRecords, BY_NAME);

Lara

1

Sigurd

2

Bas

3

Jana

3

Jouni

3

Rosella

3

Fikriyya

4

Nikolaj

4

sort(studentRecords, BY_SECTION);

Equivalent items don’t ‘cross over’ when being stably sorted.

55 of 75

Other Desirable Sorting Property: Stability

A sort is said to be stable if order of equivalent items is preserved.

Bas

3

Fikriyya

4

Jana

3

Jouni

3

Lara

1

Nikolaj

4

Rosella

3

Sigurd

2

sort(studentRecords, BY_NAME);

Lara

1

Sigurd

2

Jouni

3

Rosella

3

Bas

3

Jana

3

Fikriyya

4

Nikolaj

4

sort(studentRecords, BY_SECTION);

Sorting instability can be really annoying! Wanted students listed alphabetically by section.

56 of 75

Is Selection Sort Stable? Sort to find out!

57 of 75

Is Insertion Sort Stable? Sort to find out!

58 of 75

Is Merge Sort Stable? Sort to find out!

59 of 75

Suppose we do the following:

  • Read 1,000,000 integers from a file into an array of length 1,000,000.
  • Mergesort these integers.
  • Select one integer randomly and change it.
  • Sort using algorithm X of your choice.

Which sorting algorithm would be the fastest choice for X?

  1. Selection Sort
  2. Mergesort
  3. Insertion Sort
  4. Quick Sort

60 of 75

Costs

Best Case Running time

Worst Case Running time

Additional Space

Stability

Selection Sort

Θ(N2)

Θ(N2)

Θ(1) - in place

No

Insertion Sort

Θ(N)

Θ(N2)

Θ(1) - in place

Yes

Merge Sort

Θ(N Log N)

Θ(N Log N)

Θ(N)

Yes

Random Quick Sort

Θ(N Log N)

And average

Θ(N2)

Θ(log n)

No

61 of 75

Questions?

62 of 75

Empirical Testing

63 of 75

Practical world

  • Sometimes calculating instructions is not possible or very difficult.

  • As an alternative to describing an algorithm’s performance with a “number of abstract operations”, we can also measure its time empirically using a clock.

64 of 75

Explain these results

Timing for findMax() on an array of ints and an array of Integer objects (times are in milliseconds)

n

Array of int

Array of Integer

800,000

2.314

4.329

4,000,000

11.363

21.739

8,000,000

22.727

42.958

For these measurements, what is the likely ϴ time cost of findMax() on array of int

A: f(n) = ϴ (log2n) D: f(n) = ϴ (n)

B: f(n) = ϴ (n log2n) E: None of these

C: f(n) = ϴ (n2)

65 of 75

Explain these results

Timing for findMax() on an array of ints and an array of Integers (times are in milliseconds)

n

Array of int

Array of Integer

800,000

2.314

4.329

4,000,000

11.363

21.739

8,000,000

22.727

42.958

For these measurements, what is the likely ϴ time cost of findMax() on array of Integer

A: f(n) = ϴ (log2n) D: f(n) = ϴ (n)

B: f(n) = ϴ (n log2n) E: None of these

C: f(n) = ϴ (n2)

66 of 75

Explain these results

Timing for findMax() on an array of ints and an array of Integer (times are in milliseconds)

n

Array of int

Array of Integer

800,000

2.314

4.329

4,000,000

11.363

21.739

8,000,000

22.727

42.958

Discussion

  • Why would Integer take more time?
  • If Integer has more time, why does it have the same ϴ cost as int?

67 of 75

Reasons to use “real timers”

  • As illustrated, counting “abstract operations” can anyway hide real performance differences, e.g., between using int[] and Integer[]

  • Many programs and libraries are not open source!
  • You have to analyze an algorithm’s performance as a black box.

68 of 75

Procedure for measuring time cost

  • Let’s suppose we wish to measure the time cost of algorithm A as a function of its input size n.
  • We need to choose a set of values of n that we will test.
  • If we make n too big, our algorithm A may never terminate (the input is “too big”).
  • If we make n too small, then A may finish so fast that the “elapsed time” is practically 0, and we won’t get a reliable clock measurement.

69 of 75

Procedure for measuring time cost

  • In practice, one “guesses” a few values for n, sees how fast A executes on them, and selects a range of values for n.

  • Let’s define an array of different input sizes, e.g.:

int[ ] N = { 1000, 2000, 3000, ..., 10000 };

  • Now, for each input size N[i], we want to measure A’s time cost.

70 of 75

Basic procedure for benchmarking

71 of 75

Timer methods in Java

72 of 75

Draft 1

73 of 75

Is it enough to execute once?

  • Unfortunately, in the “real world”, each measurement of the time cost of A(X) is corrupted by noise:
    • Garbage collector!
    • Other programs running.
    • Swapping to/from disk.
    • Input/output requests from external devices
  • If we measured the time cost of A(X) based on just one measurement, then our estimate of the “true” time cost of A(X) will be very imprecise.
  • We might get unlucky and measure A(X) while the computer is doing a “system update”.

74 of 75

Draft 2

A much-improved procedure for measuring the

time cost of A(X) is to compute the average time

across Tries attempts.

75 of 75

IMPROVED procedure for benchmarking