CS61B: 2019
Lecture 13: Introduction to Asymptotic Analysis
61B: Writing Efficient Programs
An engineer will do for a dime what any fool will do for a dollar.
Efficiency comes in two flavors:
Example of Algorithm Cost
Objective: Determine if a sorted array contains any duplicates.
-3 | -1 | 2 | 4 | 4 | 8 | 10 | 12 |
Silly algorithm: Consider every possible pair, returning true if any match.
Better algorithm?
Example of Algorithm Cost
Objective: Determine if a sorted array contains any duplicates.
Silly algorithm: Consider every possible pair, returning true if any match.
Better algorithm?
-3 | -1 | 2 | 4 | 4 | 8 | 10 | 12 |
Today’s goal: Introduce formal technique for comparing algorithmic efficiency.
Intuitive Runtime Characterizations
How Do I Runtime Characterization?
Our goal is to somehow characterize the runtimes of the functions below.
public static boolean dup1(int[] A) {
for (int i = 0; i < A.length; i += 1) {
for (int j = i + 1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
}
public static boolean dup2(int[] A) {
for (int i = 0; i < A.length - 1; i += 1) {
if (A[i] == A[i + 1]) {
return true;
}
}
return false;
}
dup1
dup2
Techniques for Measuring Computational Cost
Technique 1: Measure execution time in seconds using a client program.
public static void main(String[] args) {
int N = Integer.parseInt(args[0]);
int[] A = makeArray(N);
dup1(A);
}
Time Measurements for dup1 and dup2
N | dup1 | dup2 |
10000 | 0.08 | 0.08 |
50000 | 0.32 | 0.08 |
100000 | 1.00 | 0.08 |
200000 | 8.26 | 0.1 |
400000 | 15.4 | 0.1 |
Time to complete (in seconds)
Techniques for Measuring Computational Cost
Technique 1: Measure execution time in seconds using a client program.
public static void main(String[] args) {
int N = Integer.parseInt(args[0]);
int[] A = makeArray(N);
dup1(A);
}
Techniques for Measuring Computational Cost
Technique 2A: Count possible operations for an array of size N = 10,000.
operation | count, N=10000 |
i = 0 |
|
j = i + 1 |
|
less than (<) |
|
increment (+=1) |
|
equals (==) |
|
array accesses |
|
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
The counts are tricky to compute. Work not shown.
1 to 10000
2 to 50,015,001
0 to 50,005,000
1 to 49,995,000
2 to 99,990,000
1
Techniques for Measuring Computational Cost
Technique 2B: Count possible operations in terms of input array size N.
operation | symbolic count | count, N=10000 |
i = 0 | 1 | 1 |
j = i + 1 | 1 to N | 1 to 10000 |
less than (<) | 2 to (N2+3N+2)/2 | 2 to 50,015,001 |
increment (+=1) | 0 to (N2+N)/2 | 0 to 50,005,000 |
equals (==) | 1 to (N2-N)/2 | 1 to 49,995,000 |
array accesses | 2 to N2-N | 2 to 99,990,000 |
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j<A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
Techniques for Measuring Computational Cost [dup2]
Your turn: Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations.
operation | sym. count | count, N=10000 |
i = 0 | 1 | 1 |
less than (<) | | |
increment (+=1) | | |
equals (==) | | |
array accesses | | |
for (int i = 0; i < A.length - 1; i += 1){
if (A[i] == A[i + 1]) {
return true;
}
}
return false;
Techniques for Measuring Computational Cost [dup2]
Your turn: Try to come up with rough estimates for the symbolic and exact counts for at least one of the operations.
operation | symbolic count | count, N=10000 |
i = 0 | 1 | 1 |
less than (<) | 0 to N | 0 to 10000 |
increment (+=1) | 0 to N - 1 | 0 to 9999 |
equals (==) | 1 to N - 1 | 1 to 9999 |
array accesses | 2 to 2N - 2 | 2 to 19998 |
for (int i = 0; i < A.length - 1; i += 1) {
if (A[i] == A[i + 1]) {
return true;
}
}
return false;
Especially observant folks may notice we didn’t count everything, e.g. “- 1” and “+ 1” operations. We’ll see why this omission is not a problem very shortly.
If you did this exercise but were off by one, that’s fine. The exact numbers aren’t that important.
Comparing Algorithms
Which algorithm is better? Why?
operation | symbolic count | count, N=10000 |
i = 0 | 1 | 1 |
less than (<) | 0 to N | 0 to 10000 |
increment (+=1) | 0 to N - 1 | 0 to 9999 |
equals (==) | 1 to N - 1 | 1 to 9999 |
array accesses | 2 to 2N - 2 | 2 to 19998 |
operation | symbolic count | count, N=10000 |
i = 0 | 1 | 1 |
j = i + 1 | 1 to N | 1 to 10000 |
less than (<) | 2 to (N2+3N+2)/2 | 2 to 50,015,001 |
increment (+=1) | 0 to (N2+N)/2 | 0 to 50,005,000 |
equals (==) | 1 to (N2-N)/2 | 1 to 49,995,000 |
array accesses | 2 to N2-N | 2 to 99,990,000 |
dup1
dup2
Comparing Algorithms
Which algorithm is better? dup2. Why?
operation | symbolic count | count, N=10000 |
i = 0 | 1 | 1 |
less than (<) | 0 to N | 0 to 10000 |
increment (+=1) | 0 to N - 1 | 0 to 9999 |
equals (==) | 1 to N - 1 | 1 to 9999 |
array accesses | 2 to 2N - 2 | 2 to 19998 |
operation | symbolic count | count, N=10000 |
i = 0 | 1 | 1 |
j = i + 1 | 1 to N | 1 to 10000 |
less than (<) | 2 to (N2+3N+2)/2 | 2 to 50,015,001 |
increment (+=1) | 0 to (N2+N)/2 | 0 to 50,005,000 |
equals (==) | 1 to (N2-N)/2 | 1 to 49,995,000 |
array accesses | 2 to N2-N | 2 to 99,990,000 |
dup1
dup2
Asymptotic Behavior
In most cases, we care only about asymptotic behavior, i.e. what happens for very large N.
Algorithms which scale well (e.g. look like lines) have better asymptotic runtime behavior than algorithms that scale relatively poorly (e.g. look like parabolas).
Parabolas vs. Lines
Suppose we have two algorithms that zerpify a collection of N items.
For small N, zerp1 might be faster, but as dataset size grows, the parabolic algorithm is going to fall farther and farther behind (in time it takes to complete).
Scaling Across Many Domains
We’ll informally refer to the “shape” of a runtime function as its order of growth (will formalize soon).
(from Algorithm Design: Tardos, Kleinberg)
Duplicate Finding
Our goal is to somehow characterize the runtimes of the functions below.
Characterization should be simple and mathematically rigorous.
Characterization should demonstrate superiority of dup2 over dup1.
operation | symbolic count |
i = 0 | 1 |
less than (<) | 0 to N |
increment (+=1) | 0 to N - 1 |
equals (==) | 1 to N - 1 |
array accesses | 2 to 2N - 2 |
operation | symbolic count |
i = 0 | 1 |
j = i + 1 | 1 to N |
less than (<) | 2 to (N2+3N+2)/2 |
increment (+=1) | 0 to (N2+N)/2 |
equals (==) | 1 to (N2-N)/2 |
array accesses | 2 to N2-N |
dup1: parabolic, a.k.a. quadratic
dup2: linear
Worst Case
Order of Growth
Duplicate Finding
Our goal is to somehow characterize the runtimes of the functions below.
operation | count |
i = 0 | 1 |
less than (<) | 0 to N |
increment (+=1) | 0 to N - 1 |
equals (==) | 1 to N - 1 |
array accesses | 2 to 2N - 2 |
operation | count |
i = 0 | 1 |
j = i + 1 | 1 to N |
less than (<) | 2 to (N2+3N+2)/2 |
increment (+=1) | 0 to (N2+N)/2 |
equals (==) | 1 to (N2-N)/2 |
array accesses | 2 to N2-N |
Let’s be more careful about what we mean when we say the left function is “like” a parabola, and the right function is “like” a line.
Intuitive Simplification 1: Consider Only the Worst Case
Simplification 1: Consider only the worst case.
operation | count |
i = 0 | 1 |
j = i + 1 | 1 to N |
less than (<) | 2 to (N2+3N+2)/2 |
increment (+=1) | 0 to (N2+N)/2 |
equals (==) | 1 to (N2-N)/2 |
array accesses | 2 to N2-N |
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
Intuitive Simplification 1: Consider Only the Worst Case
Simplification 1: Consider only the worst case.
operation | worst case count |
i = 0 | 1 |
j = i + 1 | N |
less than (<) | (N2+3N+2)/2 |
increment (+=1) | (N2+N)/2 |
equals (==) | (N2-N)/2 |
array accesses | N2-N |
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
We’re effectively focusing on the case where there are no duplicates, because this is where there is a performance difference.
Intuitive Order of Growth Identification: yellkey.com/carry
Consider the algorithm below. What do you expect will be the order of growth of the runtime for the algorithm?
In other words, if we plotted total runtime vs. N, what shape would we expect?
operation | count |
less than (<) | 100N2 + 3N |
greater than (>) | 2N3 + 1 |
and (&&) | 5,000 |
Intuitive Order of Growth Identification
Consider the algorithm below. What do you expect will be the order of growth of the runtime for the algorithm?
Argument:
operation | count |
less than (<) | 100N2 + 3N |
greater than (>) | 2N3 + 1 |
and (&&) | 5,000 |
Extremely important point. Make sure you understand it!
Intuitive Simplification 2: Restrict Attention to One Operation
Simplification 2: Pick some representative operation to act as a proxy for the overall runtime.
We call our choice the “cost model”.
cost model = increment
operation | worst case count |
i = 0 | 1 |
j = i + 1 | N |
less than (<) | (N2+3N+2)/2 |
increment (+=1) | (N2+N)/2 |
equals (==) | (N2-N)/2 |
array accesses | N2-N |
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
There are other good choices.
Intuitive Simplification 3: Eliminate low order terms.
Simplification 3: Ignore lower order terms.
operation | worst case |
increment (+=1) | (N2+N)/2 |
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
Intuitive Simplification 4: Eliminate multiplicative constants.
Simplification 4: Ignore multiplicative constants.
operation | worst case |
increment (+=1) | N2/2 |
for (int i = 0; i < A.length; i += 1) {
for (int j = i+1; j < A.length; j += 1) {
if (A[i] == A[j]) {
return true;
}
}
}
return false;
Simplification Summary
Simplifications:
operation | worst case o.o.g. |
increment (+=1) | N2 |
operation | count |
i = 0 | 1 |
j = i + 1 | 1 to N |
less than (<) | 2 to (N2+3N+2)/2 |
increment (+=1) | 0 to (N2+N)/2 |
equals (==) | 1 to (N2-N)/2 |
array accesses | 2 to N2-N |
These three simplifications are OK because we only care about the “order of growth” of the runtime.
Worst case order of growth of runtime: N2
Simplification Summary
Simplifications:
operation | worst case o.o.g. |
| |
These three simplifications are OK because we only care about the “order of growth” of the runtime.
operation | count |
i = 0 | 1 |
less than (<) | 0 to N |
increment (+=1) | 0 to N - 1 |
equals (==) | 1 to N - 1 |
array accesses | 2 to 2N - 2 |
Worst case order of growth of runtime:
Repeating the Process for dup2
Simplifications:
operation | worst case o.o.g. |
array accesses | N |
operation | count |
i = 0 | 1 |
less than (<) | 0 to N |
increment (+=1) | 0 to N - 1 |
equals (==) | 1 to N - 1 |
array accesses | 2 to 2N - 2 |
These three simplifications are OK because we only care about the “order of growth” of the runtime.
Any of the bottom four operations are good choices.
Worst case order of growth of runtime: N
This simplification is OK because we specifically only care about worst case.
Summary of Our (Painful) Analysis Process
Our process:
By using our simplifications from the outset, we can avoid building the table at all!
operation | worst case o.o.g. |
increment (+=1) | N2 |
operation | count |
i = 0 | 1 |
j = i + 1 | 1 to N |
less than (<) | 2 to (N2+3N+2)/2 |
increment (+=1) | 0 to (N2+N)/2 |
equals (==) | 1 to (N2-N)/2 |
array accesses | 2 to N2-N |
Worst case order of growth of runtime: N2
Simplified Analysis
Simplified Analysis Process
Rather than building the entire table, we can instead:
Let’s redo our analysis of dup1 with this new process.
Analysis of Nested For Loops (Based on Exact Count)
Find the order of growth of the worst case runtime of dup1.
Worst case number of == operations:
Overall worst case runtime: Θ(N2)
| == | == | == | == | == |
| | == | == | == | == |
| | | == | == | == |
| | | | == | == |
| | | | | == |
| | | | | |
0
1
2
3
4
5
0 1 2 3 4 5
i
j
int N = A.length;
for (int i = 0; i < N; i += 1)
for (int j = i + 1; j < N; j += 1)
if (A[i] == A[j])
return true;
return false;
C = (N - 1) + (N - 2) + (N - 3) + … + 3 + 2 + 1
2C = N + N + … + N
N-1 of these
= N(N - 1)
∴ C = N(N - 1)/2
C = 1 + 2 + 3 + … + (N - 3) + (N - 2) + (N - 1)
N = 6
Analysis of Nested For Loops (Based on Exact Count)
Find the order of growth of the worst case runtime of dup1.
Worst case number of == operations:
Overall worst case runtime: Θ(N2)
int N = A.length;
for (int i = 0; i < N; i += 1)
for (int j = i + 1; j < N; j += 1)
if (A[i] == A[j])
return true;
return false;
operation | worst case o.o.g. |
== | N2 |
Worst case order of growth of runtime: N2
C = 1 + 2 + 3 + … + (N - 3) + (N - 2) + (N - 1) = N(N-1)/2
| == | == | == | == | == |
| | == | == | == | == |
| | | == | == | == |
| | | | == | == |
| | | | | == |
| | | | | |
0
1
2
3
4
5
0 1 2 3 4 5
i
j
N = 6
Analysis of Nested For Loops (Simpler Geometric Argument)
Find the order of growth of the worst case runtime of dup1.
Overall worst case runtime: Θ(N2)
int N = A.length;
for (int i = 0; i < N; i += 1)
for (int j = i + 1; j < N; j += 1)
if (A[i] == A[j])
return true;
return false;
Worst case number of == operations:
operation | worst case o.o.g. |
== | N2 |
Worst case order of growth of runtime: N2
| == | == | == | == | == |
| | == | == | == | == |
| | | == | == | == |
| | | | == | == |
| | | | | == |
| | | | | |
0
1
2
3
4
5
0 1 2 3 4 5
i
j
N = 6
Big-Theta
Formalizing Order of Growth
Given a function Q(N), we can apply our last two simplifications (ignore low orders terms and multiplicative constants) to yield the order of growth of Q(N).
Let’s finish out this lecture by moving to a more formal notation called Big-Theta.
Order of Growth Exercise
Consider the functions below.
function | order of growth |
N3 + 3N4 |
|
1/N + N3 |
|
1/N + 5 |
|
NeN + N |
|
40 sin(N) + 4N2 |
|
Order of Growth Exercise
Consider the functions below.
function | order of growth |
N3 + 3N4 | N4 |
1/N + N3 | N3 |
1/N + 5 | 1 |
NeN + N | NeN |
40 sin(N) + 4N2 | N2 |
Big-Theta
Suppose we have a function R(N) with order of growth f(N).
function R(N) | order of growth |
N3 + 3N4 | N4 |
1/N + N3 | N3 |
1/N + 5 | 1 |
NeN + N | NeN |
40 sin(N) + 4N2 | N2 |
Big-Theta: Formal Definition (Visualization)
means there exist positive constants k1 and k2 such that:
for all values of N greater than some N0.
Example: 40 sin(N) + 4N2 ∈ Θ(N2)
i.e. very large N
Big-Theta Challenge (Visualization)
Suppose R(N) = (4N2+3N*ln(N))/2.
means there exist positive constants k1 and k2 such that:
for all values of N greater than some N0.
i.e. very large N
Big-Theta Challenge (Visualization)
Suppose R(N) = (4N2+3N*ln(N))/2.
means there exist positive constants k1 and k2 such that:
for all values of N greater than some N0.
i.e. very large N
Big-Theta and Runtime Analysis
Using Big-Theta doesn’t change anything about runtime analysis (no need to find k1 or k2 or anything like that).
operation | worst case count |
increment (+=1) | Θ(N2) |
operation | worst case count |
i = 0 | 1 |
j = i + 1 | Θ(N) |
less than (<) | Θ(N2) |
increment (+=1) | Θ(N2) |
equals (==) | Θ(N2) |
array accesses | Θ(N2) |
Worst case runtime: Θ(N2)
Big O Notation
Big Theta
We used Big Theta to describe the order of growth of a function.
We also used Big Theta to describe the rate of growth of the runtime of a piece of code.
function R(N) | order of growth |
N3 + 3N4 | Θ(N4) |
1/N + N3 | Θ(N3) |
1/N + 5 | Θ(1) |
NeN + N | Θ(NeN) |
40 sin(N) + 4N2 | Θ(N2) |
Big O
Whereas Big Theta can informally be thought of as something like “equals”, Big O can be thought of as “less than or equal”.
Example, the following are all true:
Big Theta: Formal Definition (Visualization)
means there exist positive constants k1 and k2 such that:
for all values of N greater than some N0.
Example: 40 sin(N) + 4N2 ∈ Θ(N2)
i.e. very large N
Big O: Formal Definition (Visualization)
means there exists positive constants k2 such that:
for all values of N greater than some N0.
Example: 40 sin(N) + 4N2 ∈ O(N4)
i.e. very large N
Big Theta vs. Big O
We will see why big O is practically useful in the next lecture.
| Informal meaning: | Family | Family Members |
Big Theta Θ(f(N)) | Order of growth is f(N). | Θ(N2) | N2/2 2N2 N2 + 38N + N |
Big O O(f(N)) | Order of growth is less than or equal to f(N). | O(N2) | N2/2 2N2 lg(N) |
Summary
Given a code snippet, we can express its runtime as a function R(N), where N is some property of the input of the function (often the size of the input).
Rather than finding R(N) exactly, we instead usually only care about the order of growth of R(N).
One approach (not universal):
Citations
TSP problem solution, title slide: http://support.sas.com/documentation/cdl/en/ornoaug/65289/HTML/default/viewer.htm#ornoaug_optnet_examples07.htm#ornoaug.optnet.map002g
Table of runtimes for various orders of growth: Kleinberg & Tardos, Algorithm Design.