1 of 54

CS61B: 2019

Lecture 13: Introduction to Asymptotic Analysis

  • Intuitive Runtime
  • Detailed Analysis of Worst Case Order of Growth
  • Simplified Analysis
  • Big Theta Notation
  • Big O Notation

2 of 54

61B: Writing Efficient Programs

An engineer will do for a dime what any fool will do for a dollar.

Efficiency comes in two flavors:

  • Programming cost (course to date. Will also revisit later).
    • How long does it take to develop your programs?
    • How easy is it to read, modify, and maintain your code?
      • More important than you might think!
      • Majority of cost is in maintenance, not development!
  • Execution cost (from today until end of course).
    • How much time does your program take to execute?
    • How much memory does your program require?�

3 of 54

Example of Algorithm Cost

Objective: Determine if a sorted array contains any duplicates.

  • Given sorted array A, are there indices i and j where A[i] = A[j]?

-3

-1

2

4

4

8

10

12

Silly algorithm: Consider every possible pair, returning true if any match.

  • Are (-3, -1) the same? Are (-3, 2) the same? ...

Better algorithm?

4 of 54

Example of Algorithm Cost

Objective: Determine if a sorted array contains any duplicates.

  • Given sorted array A, are there indices i and j where A[i] = A[j]?

Silly algorithm: Consider every possible pair, returning true if any match.

  • Are (-3, -1) the same? Are (-3, 2) the same? ...

Better algorithm?

  • For each number A[i], look at A[i+1], and return true the first time you see a match. If you run out of items, return false.

-3

-1

2

4

4

8

10

12

Today’s goal: Introduce formal technique for comparing algorithmic efficiency.

5 of 54

Intuitive Runtime Characterizations

6 of 54

How Do I Runtime Characterization?

Our goal is to somehow characterize the runtimes of the functions below.

  • Characterization should be simple and mathematically rigorous.
  • Characterization should demonstrate superiority of dup2 over dup1.

public static boolean dup1(int[] A) {

for (int i = 0; i < A.length; i += 1) {

for (int j = i + 1; j < A.length; j += 1) {

if (A[i] == A[j]) {

return true;

}

}

}

return false;

}

public static boolean dup2(int[] A) {

for (int i = 0; i < A.length - 1; i += 1) {

if (A[i] == A[i + 1]) {

return true;

}

}

return false;

}

dup1

dup2

7 of 54

Techniques for Measuring Computational Cost

Technique 1: Measure execution time in seconds using a client program.

  • Tools:
    • Physical stopwatch.
    • Unix has a built in time command that measures execution time.
    • Princeton Standard library has a Stopwatch class.

public static void main(String[] args) {

int N = Integer.parseInt(args[0]);

int[] A = makeArray(N);

dup1(A);

}

8 of 54

Time Measurements for dup1 and dup2

N

dup1

dup2

10000

0.08

0.08

50000

0.32

0.08

100000

1.00

0.08

200000

8.26

0.1

400000

15.4

0.1

Time to complete (in seconds)

9 of 54

Techniques for Measuring Computational Cost

Technique 1: Measure execution time in seconds using a client program.

  • Good: Easy to measure, meaning is obvious.
  • Bad: May require large amounts of computation time. Result varies with machine, compiler, input data, etc.

public static void main(String[] args) {

int N = Integer.parseInt(args[0]);

int[] A = makeArray(N);

dup1(A);

}

10 of 54

Techniques for Measuring Computational Cost

Technique 2A: Count possible operations for an array of size N = 10,000.

operation

count, N=10000

i = 0

j = i + 1

less than (<)

increment (+=1)

equals (==)

array accesses

for (int i = 0; i < A.length; i += 1) {

for (int j = i+1; j < A.length; j += 1) {

if (A[i] == A[j]) {

return true;

}

}

}

return false;

The counts are tricky to compute. Work not shown.

  • Good: Machine independent. Input dependence captured in model.
  • Bad: Tedious to compute. Array size was arbitrary. Doesn’t tell you actual time.

1 to 10000

2 to 50,015,001

0 to 50,005,000

1 to 49,995,000

2 to 99,990,000

1

11 of 54

Techniques for Measuring Computational Cost

Technique 2B: Count possible operations in terms of input array size N.

operation

symbolic count

count, N=10000

i = 0

1

1

j = i + 1

1 to N

1 to 10000

less than (<)

2 to (N2+3N+2)/2

2 to 50,015,001

increment (+=1)

0 to (N2+N)/2

0 to 50,005,000

equals (==)

1 to (N2-N)/2

1 to 49,995,000

array accesses

2 to N2-N

2 to 99,990,000

for (int i = 0; i < A.length; i += 1) {

for (int j = i+1; j<A.length; j += 1) {

if (A[i] == A[j]) {

return true;

}

}

}

return false;

  • Good: Machine independent. Input dependence captured in model. Tells you how algorithm scales.
  • Bad: Even more tedious to compute. Doesn’t tell you actual time.

12 of 54

Techniques for Measuring Computational Cost [dup2]

Your turn: Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations.

  • Tip: Don’t worry about being off by one. Just try to predict the rough magnitudes of each.

operation

sym. count

count, N=10000

i = 0

1

1

less than (<)

increment (+=1)

equals (==)

array accesses

for (int i = 0; i < A.length - 1; i += 1){

if (A[i] == A[i + 1]) {

return true;

}

}

return false;

13 of 54

Techniques for Measuring Computational Cost [dup2]

Your turn: Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations.

operation

symbolic count

count, N=10000

i = 0

1

1

less than (<)

0 to N

0 to 10000

increment (+=1)

0 to N - 1

0 to 9999

equals (==)

1 to N - 1

1 to 9999

array accesses

2 to 2N - 2

2 to 19998

for (int i = 0; i < A.length - 1; i += 1) {

if (A[i] == A[i + 1]) {

return true;

}

}

return false;

Especially observant folks may notice we didn’t count everything, e.g. “- 1” and “+ 1” operations. We’ll see why this omission is not a problem very shortly.

If you did this exercise but were off by one, that’s fine. The exact numbers aren’t that important.

14 of 54

Comparing Algorithms

Which algorithm is better? Why?

operation

symbolic count

count, N=10000

i = 0

1

1

less than (<)

0 to N

0 to 10000

increment (+=1)

0 to N - 1

0 to 9999

equals (==)

1 to N - 1

1 to 9999

array accesses

2 to 2N - 2

2 to 19998

operation

symbolic count

count, N=10000

i = 0

1

1

j = i + 1

1 to N

1 to 10000

less than (<)

2 to (N2+3N+2)/2

2 to 50,015,001

increment (+=1)

0 to (N2+N)/2

0 to 50,005,000

equals (==)

1 to (N2-N)/2

1 to 49,995,000

array accesses

2 to N2-N

2 to 99,990,000

dup1

dup2

15 of 54

Comparing Algorithms

Which algorithm is better? dup2. Why?

operation

symbolic count

count, N=10000

i = 0

1

1

less than (<)

0 to N

0 to 10000

increment (+=1)

0 to N - 1

0 to 9999

equals (==)

1 to N - 1

1 to 9999

array accesses

2 to 2N - 2

2 to 19998

operation

symbolic count

count, N=10000

i = 0

1

1

j = i + 1

1 to N

1 to 10000

less than (<)

2 to (N2+3N+2)/2

2 to 50,015,001

increment (+=1)

0 to (N2+N)/2

0 to 50,005,000

equals (==)

1 to (N2-N)/2

1 to 49,995,000

array accesses

2 to N2-N

2 to 99,990,000

  • Fewer operations to do the same work [e.g. 50,015,001 vs. 10000 operations].
  • Better answer: Algorithm scales better in the worst case. (N2+3N+2)/2 vs. N.
  • Even better answer: Parabolas (N2) grow faster than lines (N).

dup1

dup2

16 of 54

Asymptotic Behavior

In most cases, we care only about asymptotic behavior, i.e. what happens for very large N.

  • Simulation of billions of interacting particles.
  • Social network with billions of users.
  • Logging of billions of transactions.
  • Encoding of billions of bytes of video data.

Algorithms which scale well (e.g. look like lines) have better asymptotic runtime behavior than algorithms that scale relatively poorly (e.g. look like parabolas).

17 of 54

Parabolas vs. Lines

Suppose we have two algorithms that zerpify a collection of N items.

  • zerp1 takes 2N2 operations.
  • zerp2 takes 500N operations.

For small N, zerp1 might be faster, but as dataset size grows, the parabolic algorithm is going to fall farther and farther behind (in time it takes to complete).

18 of 54

Scaling Across Many Domains

We’ll informally refer to the “shape” of a runtime function as its order of growth (will formalize soon).

  • Effect is dramatic! Often determines whether a problem can be solved at all.

(from Algorithm Design: Tardos, Kleinberg)

19 of 54

Duplicate Finding

Our goal is to somehow characterize the runtimes of the functions below.

Characterization should be simple and mathematically rigorous.

Characterization should demonstrate superiority of dup2 over dup1.

operation

symbolic count

i = 0

1

less than (<)

0 to N

increment (+=1)

0 to N - 1

equals (==)

1 to N - 1

array accesses

2 to 2N - 2

operation

symbolic count

i = 0

1

j = i + 1

1 to N

less than (<)

2 to (N2+3N+2)/2

increment (+=1)

0 to (N2+N)/2

equals (==)

1 to (N2-N)/2

array accesses

2 to N2-N

dup1: parabolic, a.k.a. quadratic

dup2: linear

20 of 54

Worst Case

Order of Growth

21 of 54

Duplicate Finding

Our goal is to somehow characterize the runtimes of the functions below.

  • Characterization should be simple and mathematically rigorous.

operation

count

i = 0

1

less than (<)

0 to N

increment (+=1)

0 to N - 1

equals (==)

1 to N - 1

array accesses

2 to 2N - 2

operation

count

i = 0

1

j = i + 1

1 to N

less than (<)

2 to (N2+3N+2)/2

increment (+=1)

0 to (N2+N)/2

equals (==)

1 to (N2-N)/2

array accesses

2 to N2-N

Let’s be more careful about what we mean when we say the left function is “like” a parabola, and the right function is “like” a line.

22 of 54

Intuitive Simplification 1: Consider Only the Worst Case

Simplification 1: Consider only the worst case.

operation

count

i = 0

1

j = i + 1

1 to N

less than (<)

2 to (N2+3N+2)/2

increment (+=1)

0 to (N2+N)/2

equals (==)

1 to (N2-N)/2

array accesses

2 to N2-N

for (int i = 0; i < A.length; i += 1) {

for (int j = i+1; j < A.length; j += 1) {

if (A[i] == A[j]) {

return true;

}

}

}

return false;

23 of 54

Intuitive Simplification 1: Consider Only the Worst Case

Simplification 1: Consider only the worst case.

  • Justification: When comparing algorithms, we often care only about the worst case [but we will see exceptions in this course].

operation

worst case count

i = 0

1

j = i + 1

N

less than (<)

(N2+3N+2)/2

increment (+=1)

(N2+N)/2

equals (==)

(N2-N)/2

array accesses

N2-N

for (int i = 0; i < A.length; i += 1) {

for (int j = i+1; j < A.length; j += 1) {

if (A[i] == A[j]) {

return true;

}

}

}

return false;

We’re effectively focusing on the case where there are no duplicates, because this is where there is a performance difference.

24 of 54

Intuitive Order of Growth Identification: yellkey.com/carry

Consider the algorithm below. What do you expect will be the order of growth of the runtime for the algorithm?

  1. N [linear]
  2. N2 [quadratic]
  3. N3 [cubic]
  4. N6 [sextic]

In other words, if we plotted total runtime vs. N, what shape would we expect?

operation

count

less than (<)

100N2 + 3N

greater than (>)

2N3 + 1

and (&&)

5,000

25 of 54

Intuitive Order of Growth Identification

Consider the algorithm below. What do you expect will be the order of growth of the runtime for the algorithm?

  • N3 [cubic]

Argument:

  • Suppose < takes α nanoseconds, > takes β nanoseconds, and && takes γ nanoseconds.
  • Total time is α(100N2 + 3N) + β(2N3 + 1) + 5000γ nanoseconds.
  • For very large N, the 2βN3 term is much larger than the others.

operation

count

less than (<)

100N2 + 3N

greater than (>)

2N3 + 1

and (&&)

5,000

Extremely important point. Make sure you understand it!

26 of 54

Intuitive Simplification 2: Restrict Attention to One Operation

Simplification 2: Pick some representative operation to act as a proxy for the overall runtime.

  • Good choice: increment.
  • Bad choice: assignment of j = i + 1.

We call our choice the “cost model”.

cost model = increment

operation

worst case count

i = 0

1

j = i + 1

N

less than (<)

(N2+3N+2)/2

increment (+=1)

(N2+N)/2

equals (==)

(N2-N)/2

array accesses

N2-N

for (int i = 0; i < A.length; i += 1) {

for (int j = i+1; j < A.length; j += 1) {

if (A[i] == A[j]) {

return true;

}

}

}

return false;

There are other good choices.

27 of 54

Intuitive Simplification 3: Eliminate low order terms.

Simplification 3: Ignore lower order terms.

operation

worst case

increment (+=1)

(N2+N)/2

for (int i = 0; i < A.length; i += 1) {

for (int j = i+1; j < A.length; j += 1) {

if (A[i] == A[j]) {

return true;

}

}

}

return false;

28 of 54

Intuitive Simplification 4: Eliminate multiplicative constants.

Simplification 4: Ignore multiplicative constants.

  • Why? It has no real meaning. We already threw away information when we chose a single proxy operation.

operation

worst case

increment (+=1)

N2/2

for (int i = 0; i < A.length; i += 1) {

for (int j = i+1; j < A.length; j += 1) {

if (A[i] == A[j]) {

return true;

}

}

}

return false;

29 of 54

Simplification Summary

Simplifications:

  1. Only consider the worst case.
  2. Pick a representative operation (a.k.a. the cost model).
  3. Ignore lower order terms.
  4. Ignore multiplicative constants.

operation

worst case o.o.g.

increment (+=1)

N2

operation

count

i = 0

1

j = i + 1

1 to N

less than (<)

2 to (N2+3N+2)/2

increment (+=1)

0 to (N2+N)/2

equals (==)

1 to (N2-N)/2

array accesses

2 to N2-N

These three simplifications are OK because we only care about the “order of growth” of the runtime.

Worst case order of growth of runtime: N2

30 of 54

Simplification Summary

Simplifications:

  • Only consider the worst case.
  • Pick a representative operation (a.k.a. the cost model).
  • Ignore lower order terms.
  • Ignore multiplicative constants.

operation

worst case o.o.g.

These three simplifications are OK because we only care about the “order of growth” of the runtime.

operation

count

i = 0

1

less than (<)

0 to N

increment (+=1)

0 to N - 1

equals (==)

1 to N - 1

array accesses

2 to 2N - 2

Worst case order of growth of runtime:

31 of 54

Repeating the Process for dup2

Simplifications:

  • Only consider the worst case.
  • Pick a representative operation (a.k.a. the cost model).
  • Ignore lower order terms.
  • Ignore multiplicative constants.

operation

worst case o.o.g.

array accesses

N

operation

count

i = 0

1

less than (<)

0 to N

increment (+=1)

0 to N - 1

equals (==)

1 to N - 1

array accesses

2 to 2N - 2

These three simplifications are OK because we only care about the “order of growth” of the runtime.

Any of the bottom four operations are good choices.

Worst case order of growth of runtime: N

This simplification is OK because we specifically only care about worst case.

32 of 54

Summary of Our (Painful) Analysis Process

Our process:

  • Construct a table of exact counts of all possible operations.
  • Convert table into a worst case order of growth using 4 simplifications.

By using our simplifications from the outset, we can avoid building the table at all!

operation

worst case o.o.g.

increment (+=1)

N2

operation

count

i = 0

1

j = i + 1

1 to N

less than (<)

2 to (N2+3N+2)/2

increment (+=1)

0 to (N2+N)/2

equals (==)

1 to (N2-N)/2

array accesses

2 to N2-N

Worst case order of growth of runtime: N2

33 of 54

Simplified Analysis

34 of 54

Simplified Analysis Process

Rather than building the entire table, we can instead:

  • Choose a representative operation to count (a.k.a. cost model).
  • Figure out the order of growth for the count of the representative operation by either:
    • Making an exact count, then discarding the unnecessary pieces.
    • Using intuition and inspection to determine order of growth (only possible with lots of practice).

Let’s redo our analysis of dup1 with this new process.

  • This time, we’ll show all our work.

35 of 54

Analysis of Nested For Loops (Based on Exact Count)

Find the order of growth of the worst case runtime of dup1.

Worst case number of == operations:

Overall worst case runtime: Θ(N2)

==

==

==

==

==

==

==

==

==

==

==

==

==

==

==

0

1

2

3

4

5

0 1 2 3 4 5

i

j

int N = A.length;

for (int i = 0; i < N; i += 1)

for (int j = i + 1; j < N; j += 1)

if (A[i] == A[j])

return true;

return false;

C = (N - 1) + (N - 2) + (N - 3) + … + 3 + 2 + 1

2C = N + N + … + N

N-1 of these

= N(N - 1)

∴ C = N(N - 1)/2

C = 1 + 2 + 3 + … + (N - 3) + (N - 2) + (N - 1)

N = 6

36 of 54

Analysis of Nested For Loops (Based on Exact Count)

Find the order of growth of the worst case runtime of dup1.

Worst case number of == operations:

Overall worst case runtime: Θ(N2)

int N = A.length;

for (int i = 0; i < N; i += 1)

for (int j = i + 1; j < N; j += 1)

if (A[i] == A[j])

return true;

return false;

operation

worst case o.o.g.

==

N2

Worst case order of growth of runtime: N2

C = 1 + 2 + 3 + … + (N - 3) + (N - 2) + (N - 1) = N(N-1)/2

==

==

==

==

==

==

==

==

==

==

==

==

==

==

==

0

1

2

3

4

5

0 1 2 3 4 5

i

j

N = 6

37 of 54

Analysis of Nested For Loops (Simpler Geometric Argument)

Find the order of growth of the worst case runtime of dup1.

Overall worst case runtime: Θ(N2)

int N = A.length;

for (int i = 0; i < N; i += 1)

for (int j = i + 1; j < N; j += 1)

if (A[i] == A[j])

return true;

return false;

Worst case number of == operations:

  • Given by area of right triangle of side length N-1.
  • Order of growth of area is N2.

operation

worst case o.o.g.

==

N2

Worst case order of growth of runtime: N2

==

==

==

==

==

==

==

==

==

==

==

==

==

==

==

0

1

2

3

4

5

0 1 2 3 4 5

i

j

N = 6

38 of 54

Big-Theta

39 of 54

Formalizing Order of Growth

Given a function Q(N), we can apply our last two simplifications (ignore low orders terms and multiplicative constants) to yield the order of growth of Q(N).

  • Example: Q(N) = 3N3 + N2
  • Order of growth: N3

Let’s finish out this lecture by moving to a more formal notation called Big-Theta.

  • The math might seem daunting at first.
  • … but the idea is exactly the same! Using “Big-Theta” instead of “order of growth” does not change the way we analyze code at all.

40 of 54

Order of Growth Exercise

Consider the functions below.

  • Informally, what is the “shape” of each function for very large N?
  • In other words, what is the order of growth of each function?

function

order of growth

N3 + 3N4

1/N + N3

1/N + 5

NeN + N

40 sin(N) + 4N2

41 of 54

Order of Growth Exercise

Consider the functions below.

  • Informally, what is the “shape” of each function for very large N?
  • In other words, what is the order of growth of each function?

function

order of growth

N3 + 3N4

N4

1/N + N3

N3

1/N + 5

1

NeN + N

NeN

40 sin(N) + 4N2

N2

42 of 54

Big-Theta

Suppose we have a function R(N) with order of growth f(N).

  • In “Big-Theta” notation we write this as R(N) ∈ Θ(f(N)).
  • Examples:
    • N3 + 3N4 ∈ Θ(N4)
    • 1/N + N3 ∈ Θ(N3)
    • 1/N + 5 ∈ Θ(1)
    • NeN + N ∈ Θ(NeN)
    • 40 sin(N) + 4N2 ∈ Θ(N2)

function R(N)

order of growth

N3 + 3N4

N4

1/N + N3

N3

1/N + 5

1

NeN + N

NeN

40 sin(N) + 4N2

N2

43 of 54

Big-Theta: Formal Definition (Visualization)

means there exist positive constants k1 and k2 such that:

for all values of N greater than some N0.

Example: 40 sin(N) + 4N2 ∈ Θ(N2)

  • R(N) = 40 sin(N) + 4N2
  • f(N) = N2
  • k1 = 3
  • k2 = 5

i.e. very large N

44 of 54

Big-Theta Challenge (Visualization)

Suppose R(N) = (4N2+3N*ln(N))/2.

  • Find a simple f(N) and corresponding k1 and k2.

means there exist positive constants k1 and k2 such that:

for all values of N greater than some N0.

i.e. very large N

45 of 54

Big-Theta Challenge (Visualization)

Suppose R(N) = (4N2+3N*ln(N))/2.

  • f(N) = N2
  • k1 = 1
  • k2 = 3

means there exist positive constants k1 and k2 such that:

for all values of N greater than some N0.

i.e. very large N

46 of 54

Big-Theta and Runtime Analysis

Using Big-Theta doesn’t change anything about runtime analysis (no need to find k1 or k2 or anything like that).

  • The only difference is that we use the Θ symbol anywhere we would have said “order of growth”.

operation

worst case count

increment (+=1)

Θ(N2)

operation

worst case count

i = 0

1

j = i + 1

Θ(N)

less than (<)

Θ(N2)

increment (+=1)

Θ(N2)

equals (==)

Θ(N2)

array accesses

Θ(N2)

Worst case runtime: Θ(N2)

47 of 54

Big O Notation

48 of 54

Big Theta

We used Big Theta to describe the order of growth of a function.

We also used Big Theta to describe the rate of growth of the runtime of a piece of code.

function R(N)

order of growth

N3 + 3N4

Θ(N4)

1/N + N3

Θ(N3)

1/N + 5

Θ(1)

NeN + N

Θ(NeN)

40 sin(N) + 4N2

Θ(N2)

49 of 54

Big O

Whereas Big Theta can informally be thought of as something like “equals”, Big O can be thought of as “less than or equal”.

Example, the following are all true:

  • N3 + 3N4 ∈ Θ(N4)
  • N3 + 3N4 ∈ O(N4)
  • N3 + 3N4 ∈ O(N6)
  • N3 + 3N4 ∈ O(N!)
  • N3 + 3N4 ∈ O(NN!)

50 of 54

Big Theta: Formal Definition (Visualization)

means there exist positive constants k1 and k2 such that:

for all values of N greater than some N0.

Example: 40 sin(N) + 4N2 ∈ Θ(N2)

  • R(N) = 40 sin(N) + 4N2
  • f(N) = N2
  • k1 = 3
  • k2 = 5

i.e. very large N

51 of 54

Big O: Formal Definition (Visualization)

means there exists positive constants k2 such that:

for all values of N greater than some N0.

Example: 40 sin(N) + 4N2 ∈ O(N4)

  • R(N) = 40 sin(N) + 4N2
  • f(N) = N4
  • k2 = 1

i.e. very large N

52 of 54

Big Theta vs. Big O

We will see why big O is practically useful in the next lecture.

Informal meaning:

Family

Family Members

Big Theta

Θ(f(N))

Order of growth is f(N).

Θ(N2)

N2/2

2N2

N2 + 38N + N

Big O

O(f(N))

Order of growth is less than or equal to f(N).

O(N2)

N2/2

2N2

lg(N)

53 of 54

Summary

Given a code snippet, we can express its runtime as a function R(N), where N is some property of the input of the function (often the size of the input).

Rather than finding R(N) exactly, we instead usually only care about the order of growth of R(N).

One approach (not universal):

  • Choose a representative operation, and let C(N) be the count of how many times that operation occurs as a function of N.
  • Determine order of growth f(N) for C(N), i.e. C(N) ∈ Θ(f(N))
    • Often (but not always) we consider the worst case count.
  • If operation takes constant time, then R(N) ∈ Θ(f(N))
  • Can use O as an alternative for Θ. O is used for upper bounds.

54 of 54

Citations

TSP problem solution, title slide: http://support.sas.com/documentation/cdl/en/ornoaug/65289/HTML/default/viewer.htm#ornoaug_optnet_examples07.htm#ornoaug.optnet.map002g

Table of runtimes for various orders of growth: Kleinberg & Tardos, Algorithm Design.