1 of 21

Sorting and Algorithm Bounds

Analyzing the best-possible comparison sorting algorithm, decision tree sort, and applying sort ideas in algorithm design.

1

Kevin Lin, with thanks to many others.

2 of 21

Feedback from the Reading Quiz

Lower bound for an algorithm to “look at all of the data at least once.”

How do natural runs occur?

Why insertion sort for tiny arrays.

2

Demo

3 of 21

3

Sort

Best-Case

Worst-Case

Space

Stable

Notes

Selection Sort

Θ(N2)

Θ(N2)

Θ(1)

No

Heapsort

Θ(N)

Θ(N log N)

Θ(1)

No

Slow in practice.

Naive Merge Sort

Θ(N log N)

Θ(N log N)

Θ(N)

Yes

Java Merge Sort

Θ(N)

Θ(N log ρ)

Θ(N)

Yes

Fastest stable sort, ρ = number of runs.

Insertion Sort

Θ(N)

Θ(N2)

Θ(1)

Yes

Best for small or almost sorted inputs.

Naive Quicksort

Θ(N log N)

Θ(N2)

Θ(N)

Yes

2x or more slower than merge sort.

Java Quicksort

Θ(N)

Θ(N2)

O(N)

No

Fastest comparison sort.

On the Worst-Case Complexity of TimSort (Auger et al./Dagstuhl Research Online Publication Server)

4 of 21

Median-of-3 Decision Tree

4

a ⩻ b

b ⩻ c

No

Yes

a ⩻ c

a c b

Yes

c a b

No

a ⩻ c

b ⩻ c

b a c

b c a

c b a

Yes

No

No

Yes

No

Yes

a b c

5 of 21

Sorting with Decision Trees

5

int[] decisionTreeSort(int a, int b, int c) {

if (a < b) {

if (b < c) return {a, b, c};

else if (a < c) return {a, c, b};

else return {c, a, b};

} else {

if (c < b) return {b, a, c};

else if (c < a) return {b, c, a};

else return {c, b, a};

}

}

6 of 21

In the worst case, how many questions would you need to ask to definitively sort {a, b, c, d}?

6

7 of 21

Decision Tree Sorting

In the worst case, how many questions would you need to ask to definitively sort {a, b, c, d}?

The decision tree needs to decide between all possible permutations of the input.

3 items. Count 6 unique permutations.

4 items. Insert the new key d in 4 positions for each of the 6 permutations of 3 items.

N items. N! unique permutations.

How many levels minimum in a binary tree with 24 leaves? log 24 = 4.58, round up to 5.

7

A

a b c

8 of 21

Optimal Comparison Sort

Reductions set upper bounds on runtime.

Theoretical optimality sets lower bounds.

On random arrays, decision tree sorting is optimal in the number of comparisons.

Cost model. Number of comparisons.

Optimal decision tree sort doesn’t exist. Provably optimal for N < 16 and N = 22.

8

A036604: Sorting numbers: minimal number of comparisons needed to sort n elements (Neil Sloane/OEIS); Towards Optimal Sorting of 16 Elements (Marcin Peczarski/arXiv)

9 of 21

Decision Tree Sorting Analysis

9

10 of 21

Upper Bound for N!

Goal. Find an asymptotic complexity bound for the function log(N!).

Subgoal. Find an upper bound for the function N!

10

Q

11 of 21

Upper Bound for N!

Goal. Find an asymptotic complexity bound for the function log(N!).

Subgoal. Find an upper bound for the function N!

11

A

12 of 21

Upper Bound for log(N!)

Goal. Find an asymptotic complexity bound for the function log(N!).

Subgoal. Find an upper bound for the function log(N!)

12

Q

13 of 21

Upper Bound for log(N!)

Goal. Find an asymptotic complexity bound for the function log(N!).

Subgoal. Find an upper bound for the function log(N!)

13

A

Demo

14 of 21

Lower Bound for log(N!)

Goal. Find an asymptotic complexity bound for the function log(N!).

Subgoal. Find a lower bound for the function log(N!)

14

Q

15 of 21

Lower Bound for log(N!)

Goal. Find an asymptotic complexity bound for the function log(N!).

Subgoal. Find a lower bound for the function log(N!)

15

A

16 of 21

What can we say about decision tree sorting?

16

17 of 21

Sorting Algorithm Bound

Decision tree sorting is in Θ(N log N) on random arrays.

Lower bound on worst case. Any possible comparison sort on a random array must be in Ω(N log N).

17

Demo

A

18 of 21

18

Sort

Best-Case

Worst-Case

Space

Stable

Notes

Selection Sort

Θ(N2)

Θ(N2)

Θ(1)

No

Heapsort

Θ(N)

Θ(N log N)

Θ(1)

No

Slow in practice.

Naive Merge Sort

Θ(N log N)

Θ(N log N)

Θ(N)

Yes

Java Merge Sort

Θ(N)

Θ(N log ρ)

Θ(N)

Yes

Fastest stable sort, ρ = number of runs.

Insertion Sort

Θ(N)

Θ(N2)

Θ(1)

Yes

Best for small or almost sorted inputs.

Naive Quicksort

Θ(N log N)

Θ(N2)

Θ(N)

Yes

2x or more slower than merge sort.

Java Quicksort

Θ(N)

Θ(N2)

O(N)

No

Fastest comparison sort.

Optimal Sort

Θ(N)

Θ(N log N)

Θ(1)

Yes

Doesn’t exist (yet).

On the Worst-Case Complexity of TimSort (Auger et al./Dagstuhl Research Online Publication Server)

19 of 21

Algorithm Design Tips

19

20 of 21

Algorithm Design Paradigms

Greedy Algorithms. Consider each option in order of lowest-cost.

  • Prim’s Algorithm.
  • Kruskal’s Algorithm.
  • Dijkstra’s Algorithm.

Caveat. Can lead to suboptimal solutions.

Dijkstra’s algorithm on negative edge weighted graphs.

Divide-and-Conquer Algorithms. Solve two or more subproblems recursively, and then combine the results.

  • Merge sort.
  • Quicksort.

Prototypical usage. Turn brute-force N2 runtime algorithm into N log N algorithm.

20

Algorithms (Robert Sedgewick, Kevin Wayne/Princeton)

21 of 21

Algorithm Design Process

Hypothesize. How do invariants affect the behavior for each operation?

Identify. What strategies have we used before? What examples can we apply?

Plan. Propose a new way from findings.

Analyze. Does the plan do the job? What are potential problems with the plan?

Create. Implement the plan.

Evaluate. Check implemented plan.

Find a lower and upper bound. Define a slow but totally correct solution. Build a mental model: identify key properties.

Consider each algorithm that you know. Which ones might work? How do the existing algorithms break down?

Apply an algorithm design idea. Perform a reduction: transform input and output. Or modify the data structures used.

Use an algorithm design paradigm.

21