Introduction to programming in Python
Lecture 11:
Analysis of algorithms, complexity, recursion, computability
Chalmers (DAT425/DAT445/DAT455/DAT505)
Krasimir Angelov and Aarne Ranta
Agenda
Search algorithms: linear and binary search
Complexity
Sorting
Recursive functions
Recursive data types
Computability and the halting problem
Search algorithms: linear and binary search
The search problem
Find an element and return a value, for example:
Python dictionaries and the index()-method for lists can be used for that.
But how do they work internally? What would we do if we didn’t have them?
Linear search
def linsearch(x,ns):
counter = 0
for n in ns:
if n == x:
return counter
� counter = counter + 1
return False
A counter for the number elements which we have already checked
When we find a match we return the counter
Otherwise we increment the counter
�If we have gone through the list and we have not found a match then we return False as an indication.
Binary search
def binsearch(x,ns):
low = 0
high = len(ns)-1
� while low <= high:
mid = (low + high)//2
y = ns[mid]
if x == y:
return mid
elif x < y:
high = mid - 1
else:
low = mid + 1
return False
�Define the lowest and the highest limit for the part of the list where we search. The limits will approach each other
If the element is not in the list, return False as an indication.
Binary search is more complicated but a lot faster. Requires a sorted list.
Binary Search step-by-step
| | | | | | | | | | | | | | | | | | | |
low
high
def binsearch(x,ns):
low = 0
high = len(ns)-1
� while low <= high:
mid = (low + high)//2
....
return False
Binary Search step-by-step
| | | | | | | | | | | | | | | | | | | |
low
high
def binsearch(x,ns):
low = 0
high = len(ns)-1
� while low <= high:
mid = (low + high)//2
....
return False
mid
Binary Search step-by-step
| | | | | | | | | y | | | | | | | | | | |
low
high
....
y = ns[mid]
if x == y:
return mid
elif x < y:
high = mid - 1
else:
low = mid + 1� ....
mid
In one step we skip everything that is:
Binary Search step-by-step
| | | | | | | | | y | | | | | | | | | | |
low
high
....
y = ns[mid]
if x == y:
return mid
elif x < y:
high = mid - 1
else:
low = mid + 1� ....
In one step we skip everything that is:
Binary Search step-by-step
| | | | | | | | | y | | | | | | | | | | |
low
high
....
y = ns[mid]
if x == y:
return mid
elif x < y:
high = mid - 1
else:
low = mid + 1� ....
In one step we skip everything that is:
Binary Search step-by-step
| | | | | | | | | | | | | | | | | | | |
low
high
....
y = ns[mid]
if x == y:
return mid
elif x < y:
high = mid - 1
else:
low = mid + 1� ....
In one step we skip everything that is:
Binary Search step-by-step
| | | | | | | | | | | | | | | | | | | |
low
high
....
y = ns[mid]
if x == y:
return mid
elif x < y:
high = mid - 1
else:
low = mid + 1� ....
In one step we skip everything that is:
Binary Search step-by-step
| | | | | | | | | | | | | | | | | | | |
low
high
def binsearch(x,ns):
low = 0
high = len(ns)-1
� while low <= high:
mid = (low + high)//2
....
If we don’t find a match then eventually it will happen that high < low, and this will terminate the loop.
Let’s test the algorithms
>>> ws = fileWords('bible.txt')
>>> len(ws)
13106
>>> linsearch('mary',ws)
2331
>>> binsearch('mary',ws)
False
>>> sws = sorted(ws)
>>> binsearch('mary',sws)
7418
Read the words from King James bible.
There are that many unique words (tokens).
Mary appears in that position.
�Binary search fails if the list is not sorted.
�Now we sort.
Now we get a position in the alphabetical order.
Simple measuring of processor time
import time
start = time.process_time()
code
end = time.process_time()
print(end-start)
Use the time library
�Take the processor time before running the code
Run the code
Take the processor time after
Compute the difference
Benchmarking
def benchmark(f,x,ns):
start = time.process_time()
n = 1000
for i in range(n):
f(x,ns)
end = time.process_time()
print((end-start)/n)
We run the function f with arguments x and ns.
We run it 1000 to have better precision. Running it only once might be too fast for correct time measurement.
Let’s compare the times now.
>>> ws = fileWords('bible.txt')
>>> sws=sorted(ws)
>>> benchmark(binsearch,'mary',sws)
1.8267639999999251e-06
>>> benchmark(linsearch,'mary',sws)
0.00041089155600000003
Read the words from King James bible.�Sort them
Benchmark the binary search.
Benchmark the linear search.
A graphical comparison of the algorithms
Linear search: red
Binary search: blue
x-axis: millions of elements
y-axis: processing time in seconds
We see that the time for linear search grows linearly:
But why is the binary search better?
Logarithmic complexity
Suppose that we search x among 1000 elements in ns.
Step 1: if x == ns[500], we are done. Otherwise we go to one of the halfs, let say the right-hand half.
Step 2: if x == ns[750], done. Otherwise
Step 3: if x == ns[875], done. Otherwise
Step 4: if x == ns[938], done. Otherwise
….
Step 10: if x == ns[999], done.
In the worst case we need 10 ~ log21000 steps. We say that binary search has logarithmic complexity.
Take away message
“Everything should be made as simple as possible, but no simpler.”� Albert Einstein
Complexity
The Big O notation for complexity
constant: O(1)
logarithmic: O(log n)
linear: O(n)
log-linear: O(n log n)
quadratic: O(n2)
cubic: O(n3)
exponential: O(2n)
factorial: O(n!)
Examples of algorithms of different complexities
constant: O(1) compute an arithmetic formula�logarithmic: O(log n) binary search
linear: O(n) linear search
log-linear: O(n log n) sorting (with the best algorithms)
quadratic: O(n2) sorting (with some simpler functions)
cubic: O(n3) matrix multiplication (the naive method)
exponential: O(2n) iterate over the subsets of a set
factorial: O(n!) permutations; determinants
Some more concepts
Time complexity: the number of steps or the time as a function of data size
Memory complexity: the memory size as a function of data size
Asymptotic complexity: the size order, i.e. the O-class
Worst case complexity: the maximum complexity
Average complexity: the average for all possible input-data
In theoretical data science we usually consider the worst case complexity, while in practice the average complexity might be more important.
Sorting
How should we think?
This is something that we often do - for example sort books in a bookshelf.
A naive method: selection sort.
Complexity: (n-1) + (n-2) + … + 1 = (n*(n-1))/2 = (n2 - n)/2, therefore quadratic.
Selection Sort
Zelle, p. 478
Selection Sort step-by-step
| | | | | | | | | | | | | | | | | | | |
bottom
Everything here is sorted.
Bottom marks the first element from where the unsorted elements start.
Selection Sort step-by-step
| | | | | x | | | | | y | | | | | | | | | |
bottom
Everything here is sorted.
....�mp = bottom�for i in range(bottom+1,n):� if nums[i] < nums[mp]:� mp = i�....
With this loop we search for the position of the smallest element in the range bottom..n.
mp
At the end of the loop mp will point to the smallest element.
Selection Sort step-by-step
| | | | | y | | | | | x | | | | | | | | | |
bottom
Everything here is sorted.
nums[bottom],nums[mp] = nums[mp],nums[bottom]
mp
At the end we swap the elements at positions bottom and mp. ��On the next iteration of the outer loop bottom is incremented. Everything before bottom is again guaranteed to be sorted.
Complexity of Selection Sort
Two nested loops. Both do at most n iterations.
The complexity is n*n, i.e. O(n2).
A better sorting algorithm
Merge Sort (1)
def mergeSort(nums):� n = len(nums)
if n > 1:
�� m = n // 2� nums1, nums2 = nums[:m], nums[m:]�� mergeSort(nums1)
mergeSort(nums2)
merge(num1, nums2, nums)
Do nothing if the list has less than 1 element.
Divide the list into two sublists of equal size.
Sort each sublist separately.
Merge the results.
Merge Sort (2)
def merge(lst1, lst2, lst3):� i1,i2,i3 = 0,0,0� n1,n2 = len(lst1),len(lst2)�� while i1 < n1 and i2 < n2:� if lst1[i1] < lst2[i2]:� lst3[i3] = lst1[i1]� i1 = i1 + 1� else:� lst3[i3] = lst2[i2]� i2 = i2 + 1� i3 = i3 + 1�� while i1 < n1:� lst3[i3] = lst1[i1]� i1 = i1 + 1� i3 = i3 + 1�� while i2 < n2:� lst3[i3] = lst2[i2]� i2 = i2 + 1� i3 = i3 + 1
First loop: Alway pick the smallest of the current two elements and put it into lst3.
Second loop: If the first list was longer than the second, copy what is left back back to lst3.
�Third loop: If the second list was longer than the first, copy what is left back back to lst3.
Merging step-by-step
| | | | | x | | | | |
lst1
Everything here has been copied.
i1
i1 marks the beginning of the area to be copied.
| | | | | | | y | | |
lst2
Everything here has been copied.
i2
i2 marks the beginning of the area to be copied.
| | | | | | | | | | | | | | | | | | | |
lst3
Everything up to here is filled in.
i3
i3 marks the beginning of the area to be filled in.
Merging step-by-step (x < y)
| | | | | x | z | | | |
lst1
Everything here has been copied.
i1
i1 marks the beginning of the area to be copied.
| | | | | | | y | | |
lst2
Everything here has been copied.
i2
i2 marks the beginning of the area to be copied.
| | | | | | | | | | | | x | | | | | | | |
lst3
Everything up to here is filled in.
i3
i3 marks the beginning of the area to be filled in.
Merging step-by-step (z > y)
| | | | | x | z | | | |
lst1
Everything here has been copied.
i1
i1 marks the beginning of the area to be copied.
| | | | | | | y | | |
lst2
Everything here has been copied.
i2
i2 marks the beginning of the area to be copied.
| | | | | | | | | | | | x | y | | | | | | |
lst3
Everything up to here is filled in.
i3
i3 marks the beginning of the area to be filled in.
Complexity of Merge Sort
O(n log n) < O(n2)
Recursive functions
Two ways to compute factorial: n! = 1*2*3..n
def factorial(n):
r = 1
for k in range(1,n+1):
r = r * k
return r
def factorial(n):
if n <= 1:
return 1
else:
return factorial(n-1) * n
For-loop:
�Recursive function:
Factorial
n! = 1*2*3...n
(n+1)! = 1*2*3...n*(n+1) = n!*(n+1)
Substitution:
(n+1)! = n!*(n+1) or equivalently n! = (n-1)! * n
What is a base case?
Hanoi Towers
Move all discs from the tower on the left to the tower in the middle, while:
Recursive Strategy
To move n discs from the left-most to the right-most tower:
Hanoi Towers
Move all discs from the tower on the left to the tower in the middle, while:
def hanoi(fro, to, other, n):
if n == 0:
return
hanoi(fro, other, to, n-1)
move(fro,to)
hanoi(other, to, fro, n-1)
Recursive data structures
A Tree
root
tree1
tree2
tree3
treen
…
A tree has a node which is its root, and branches ("subtrees"), which themselves are trees.
The object continues downwards until we reach a node without branches.
Examples
Example: the Swedish royal family year 2016
A class for trees
class Tree:
def __init__(self,node,subtrees):
self.root = node
self.subtrees = subtrees
def getParts(self):
return self.root, self.subtrees
def depth(tree):
(n,ts) = tree.getParts()
if ts:
return 1 + maximum([depth(t) for t in ts])
else:
return 1
Attributes:
A getter method to get the back the root and the subtrees.
Example: compute the depth of the tree with a recursive function.
Computability and the halting problem
Computability
Are there algorithms for all (precisely formulated) problems?
Mathematical version: is there a program/machine which can solve all problems?
Answer - from Turing! - is no: there are problems which are undecidable.
One of these is the halting problem:
Would have been very useful to have a program which can solve that problem!
The halting problem: simple proof
Zelle's book, pages. 491-492:
Would turing.py terminate with input turing.py?
This is a contradiction which shows that terminates() is impossible to construct.
Other non-computable problems
To find a proof or counter proof in predicate calculus (a simple system for logic, which can be used to implement mathematical proofs on a computer).
AI-complete problems: an AI-problem which would require "unlimited intelligence" to be solved.
Example: translation of natural language.
The good news
There is a lot to do
And it will never be finished!
References
From the book