1 of 38

CHAPTER 4

Searching & Algorithm Analysis

NOTE: The material in this presentation is a compilation from the Zybook of “Principles of Data Structure”, the book “Data Structures and other objects using C++” by Michael Main and Walter Savitvch, and original content from your professor.

By Prof. Rafael Orta, M.S.

2 of 38

https://quizizz.com/

3 of 38

4 of 38

Last week we covered

NOTE: The material in this presentation is a compilation from the Zybook of “Principles of Data Structure”, the book “Data Structures and other objects using C++” by Michael Main and Walter Savitvch, and original material from your professor.

Heuristics
Greedy Algorithms
Dynamic Programming

5 of 38

Agenda for this week

NOTE: The material in this presentation is a compilation from the Zybook of “Principles of Data Structure”, the book “Data Structures and other objects using C++” by Michael Main and Walter Savitvch, and original material from your professor.

Searching Algorithms

Linear Search
Selection Sort

Binary Search

Constant time operations
Growth of functions and complexity
Big O notation
Algorithm Analysis
Recursive definitions
Analyzing the time complexity of recursive algorithms

6 of 38

Linear (Serial) Search

Linear search steps through part of an array one item at a time looking for a desired item. The search stops when the item is found or when the search has examined each item without success.

The technique is often used because it is easy to write and is applicable to many situations.

See whiteboard example

7 of 38

Linear Search (Pseudocode)

8 of 38

Using linear search and if given a list of 10,000 elements, and if each comparison takes 2 µs, what is the longest possible runtime for linear search?

Linear search works perfectly is you are searching a small array, However, if a search is done over a large array it is worthwhile to find a faster algorithm. Imagine if the only way to get to your contacts in your phone was by scrolling one by one until you get to the phone of your friend Zuckerberg Mark. ☺

A/ The longest possible runtime happens if the element you are looking for is at the end of the list or is not in the list, thus 20,000 µs is the longest.

9 of 38

Given a list of 10,000 elements, and if each comparison takes 2 µs, what is the fastest possible runtime for linear search?

A/ The fastest possible runtime happens if the element you are looking for is at the beginning of the list, thus 2 µs is the fastest.

10 of 38

Let’s examine the Linear Search program in C++

11 of 38

Selection Sort

Selection sort is a simple sorting algorithm. This sorting algorithm is an in-place comparison-based algorithm in which the list is divided into two parts, the sorted part at the left end and the unsorted part at the right end. Initially, the sorted part is empty, and the unsorted part is the entire list. The smallest element is selected from the unsorted array and swapped with the leftmost element, and that element becomes a part of the sorted array. This process continues moving unsorted array boundary by one element to the right. This algorithm is not suitable for large data sets as its average and worst-case complexities are of Ο(n²), where n is the number of items.

12 of 38

Selection Sort (Pseudocode)

13 of 38

Let’s examine the selection sort program in C++

14 of 38

Binary Search

May be used only if the array is sorted.
A binary search will search the middle of the array first and compares the values, then it keeps searching in the middle of the half that contains the value, an on each step it reduces the search by half.

15 of 38

Binary Search

A contact list is searched for Bob. �Assume the following contact list: Amy, Bob, Chris, Holly, Ray, Sarah, Zoe

1.- What is the first contact searched?

2.- What is the second contact searched?

16 of 38

Given a list of 1024 elements, what is the runtime for binary search if the search key is less than all elements in the list?

17 of 38

18 of 38

Let’s check the Binary Search program

19 of 38

20 of 38

Constant time operations

When the time required by a function does not depend on the size of the input, the procedure is called a constant time which is written O(1).

“in practice, designing an efficient algorithm aims to lower the amount of time that an algorithm runs. However, a single algorithm can always execute more quickly on a faster processor. Therefore, the theoretical analysis of an algorithm describes runtime in terms of number of constant time operations, not nanoseconds. A constant time operation is an operation that, for a given processor, always operates in the same amount of time, regardless of input values. “

21 of 38

True or False, a loop is never a constant time operation.

22 of 38

Growth of functions and complexity

Upper and lower bounds

An algorithm with runtime complexity T(N) has a lower bound and an upper bound.

Lower bound: A function f(N) that is ≤ the best case T(N), for all values of N ≥ 1.
Upper bound: A function f(N) that is ≥ the worst case T(N), for all values of N ≥ 1.

23 of 38

Growth of Functions

24 of 38

Growth of functions and complexity

Growth rates and asymptotic notations

An additional simplification can factor out the constant from a bounding function, leaving a function that categorizes the algorithm's growth rate. Ex: Instead of saying that an algorithm's runtime function has an upper bound of 30N2, the algorithm could be described as having a worst-case growth rate of N2. Asymptotic notation is the classification of runtime complexity that uses functions that indicate only the growth rate of a bounding function. Three asymptotic notations are commonly used in complexity analysis:

O notation provides a growth rate for an algorithm's upper bound.

Ω notation provides a growth rate for an algorithm's lower bound.

Θ notation provides a growth rate that is both an upper and lower bound (average).

Suplemental Informational Video (To watch at home)

25 of 38

Growth of functions and complexity

Supplemental Material to watch at home

Interested in running time of algorithms. - analyze in terms of size of input. Use "n" for input size typically.
2 fundamental issues:

1. How detailed do we want to get?
2.- What do we want to focus on?

Focus on the how does running time *scale* with the input size. - called "asymptotic analysis" - idea is: ignore low-order terms, constant factors (which includes issues like how much does comparison cost versus getting an array value), and look at running time as a function of n in the limit as n gets large. - to do this, have convenient notation: O, Theta, Omega, little-o. Think of T(n) as running time of program on input of size n. f(n) some function like n^2, "O" notation: T(n) is O(f(n)) if there exist two positive constants c and n_0 such that T(n) <= c*f(n) for all n > n_0 [T(n) is AT MOST proportional to f(n)] E,g, 3n^2 + n is O(n^2). 5n is O(n^2) and is also O(n). Formally, should really write $T(n) \in O(f(n))$. Really O(n^2) is a set of functions, and the definition above is telling us which functions are in the set. But, convention is to write T(n) = O(f(n)), which is a little confusing. If its gets confusing to you, use \in. Typical use: running time T(n) <= [blah] <= 3n^2 + 5n - 2 = O(n^2). "Omega" notation: T(n) is Omega( f(n) ) if there exist two positive constants c and n_0 such that T(n) >= c*f(n) for all n > n_0 [T(n) is AT LEAST proportional to f(n)] E.g., we'll prove that any comparison-based sorting algorithm must take Omega(n log n) in worst case. "Theta" notation: T(n) is Theta( f(n) ) if it is both O(f(n)) AND Omega(f(n)). [T(n) IS proportional to f(n) in the limit] E.g., running time of mergesort is Theta(n log n).

https://www.tutorialspoint.com/data_structures_algorithms/asymptotic_analysis.htm

26 of 38

27 of 38

28 of 38

Big O Notation

“Big O notation is a mathematical way of describing how a function (running time of an algorithm) generally behaves in relation to the input size. In Big O notation, all functions that have the same growth rate (as determined by the highest order term of the function) are characterized using the same Big O notation. In essence, all functions that have the same growth rate are considered equivalent in Big O notation.

Given a function that describes the running time of an algorithm, the Big O notation for that function can be determined using the following rules:

If f(N) is a sum of several terms, the highest order term (the one with the fastest growth rate) is kept and others are discarded.
If f(N) has a term that is a product of several factors, all constants (those that are not in terms of N) are omitted.”

29 of 38

Big O Notation

Which of the following Big O notations is equivalent to O(N+9999)?

A: O(1) B: O(N) C: O(9999)

30 of 38

Big O Notation

Runtime growth rate

One consideration in evaluating algorithms is that the efficiency of the algorithm is most critical for large input sizes. Small inputs are likely to result in fast running times because N is small, so efficiency is less of a concern. The table below shows the runtime to perform f(N) instructions for different functions f and different values of N. For large N, the difference in computation time varies greatly with the rate of growth of the function f. The data assumes that a single instruction takes 1 μs to execute.

31 of 38

Algorithm Analysis

Worst-case algorithm analysis

“To analyze how runtime of an algorithm scales as the input size increases, we first determine how many operations the algorithm executes for a specific input size, N. Then, the big-O notation for that function is determined. Algorithm runtime analysis often focuses on the worst-case runtime complexity. The worst-case runtime of an algorithm is the runtime complexity for an input that results in the longest execution. Other runtime analyses include best-case runtime and average-case runtime. Determining the average-case runtime requires knowledge of the statistical properties of the expected data inputs. “

32 of 38

Algorithm Analysis

Counting constant time operations

“For algorithm analysis, the definition of a single operation does not need to be precise. An operation can be any statement (or constant number of statements) that has a constant runtime complexity, O(1). Since constants are omitted in big-O notation, any constant number of constant time operations is O(1). So, precisely counting the number of constant time operations in a finite sequence is not needed.”

Ex: An algorithm with a single loop that execute 5 operations before the loop, 3 operations each loop iteration, and 6 operations after the loop would have a runtime of f(N) = 5 + 3N + 6, which can be written as O(1) + O(N) + O(1) = O(N). If the number of operations before the loop was 100, the big-O notation for those operation is still O(1).

33 of 38

34 of 38

35 of 38

Recursive Definitions

Recursion is an elegant method of solving many computer problems that relies on the capability of a function to continue calling itself until it satisfies a particular condition.

Recursive algorithm is an algorithm that breaks the problem into smaller subproblems and applies the algorithm itself to solve the smaller subproblems.

Recursive function is a function that calls itself. Recursive functions are commonly used to implement recursive algorithms.

36 of 38

What is recursion ?

Video

37 of 38

Analyzing the time complexity of recursive algorithms

Recurrence relations

“Using O-notation to express runtime complexity of a recursive function requires solving the recurrence relation. For simpler recursive functions such as binary search, runtime complexity can be determined by expressing the number of function calls as a function of N.”

1 of 38

2 of 38

3 of 38

4 of 38

5 of 38

6 of 38

7 of 38

8 of 38

9 of 38

10 of 38

11 of 38

12 of 38

13 of 38

14 of 38

15 of 38

16 of 38

17 of 38

18 of 38

19 of 38

20 of 38

21 of 38

22 of 38

23 of 38

24 of 38

25 of 38

26 of 38

27 of 38

28 of 38

29 of 38

30 of 38

31 of 38

32 of 38

33 of 38

34 of 38

35 of 38

36 of 38

37 of 38

38 of 38