1 of 20

CSE 163

Performance + Profiling

Hunter Schafer

2 of 20

Method 2: Count steps

Assumption: A single simple line of code takes the same amount of time to run in Python.

Examples:

num = 5 + 10;

print(β€œhello);

x > 5 and x % 2 == 0

Rules

  • Blocks of code
    • Sum of steps of all the lines
  • For loops
    • Number of steps in body multiplied by number of times loop runs
  • Function call
    • The number of steps in its body

2

num = 5 + 10

print(β€œhello”)

num > 5 and num % 2 == 0

3 of 20

Growth rate

We often think of algorithms in terms of growth rate in terms of the input or input data size (n)

Think of the runtime of 5n3 + 4n2 + 3n + 7.

All terms seem significant when we look at smaller inputs.

However, what happens when n becomes extremely large? At that point the term with the largest power of n will dominate (n3).

We say it runs on the order of n3 or O(n3) (Big Oh of n cubed).

3

4 of 20

Complexity classes

4

5 of 20

1 min - Review

What are the Big-O run-times of each of these methods?

5

def max_diff1(nums):

max_diff = 0

for n1 in nums:

for n2 in nums:

diff = abs(n1 - n2)

if diff > max_diff:

max_diff = diff

return max_diff

def max_diff2(nums):

min_num = nums[0]

max_num = nums[0]

for num in nums:

if num < min_num:

min_num = num

elif num > max_num:

max_num = num

return max_num - min_num

def max_diff3(nums):

return max(nums) - min(nums)

Think

pollev.com/cse163

6 of 20

1.5 min - Review

What are the Big-O run-times of each of these methods?

What are the Big-O run-times of each of these methods?

6

def max_diff1(nums):

max_diff = 0

for n1 in nums:

for n2 in nums:

diff = abs(n1 - n2)

if diff > max_diff:

max_diff = diff

return max_diff

def max_diff2(nums):

min_num = nums[0]

max_num = nums[0]

for num in nums:

if num < min_num:

min_num = num

elif num > max_num:

max_num = num

return max_num - min_num

def max_diff3(nums):

return max(nums) - min(nums)

Pair

pollev.com/cse163

7 of 20

Big-O

Pros

  • Simple to describe how an algorithm will scale
  • Independent of programming language / computer
  • Easy to think of (with practice 😊)

Cons

  • Can be a bit overly simplistic (a very crude approximation)
    • Misses many important issues of performance that matter in real life
  • Constants can sometimes really matter

7

8 of 20

Matrix Multiplication

8

9 of 20

Big-O

  • In theory, Big-O is a great descriptor, but in practice we generally also include timing information to help us reason about our program efficiency
  • There is not one right tool, they both have their place!
    • Big-O
      • Helps us communicate how our algorithm scales
      • Easily eliminate algorithms that won’t scale
    • Timing
      • Verify our implementation is fast
      • Help us find bottlenecks in our algorithm

9

10 of 20

1 min

Rank the algorithms in term of their speed on my computer

β€œ1=2 < 3” means 1 and 2 are about the same, but faster than 3

10

def max_diff1(nums):

max_diff = 0

for n1 in nums:

for n2 in nums:

diff = abs(n1 - n2)

if diff > max_diff:

max_diff = diff

return max_diff

def max_diff2(nums):

min_num = nums[0]

max_num = nums[0]

for num in nums:

if num < min_num:

min_num = num

elif num > max_num:

max_num = num

return max_num - min_num

def max_diff3(nums):

return max(nums) - min(nums)

Think

pollev.com/cse163

11 of 20

1 min

Rank the algorithms in term of their speed on my computer

β€œ1=2 < 3” means 1 and 2 are about the same, but faster than 3

11

def max_diff1(nums):

max_diff = 0

for n1 in nums:

for n2 in nums:

diff = abs(n1 - n2)

if diff > max_diff:

max_diff = diff

return max_diff

def max_diff2(nums):

min_num = nums[0]

max_num = nums[0]

for num in nums:

if num < min_num:

min_num = num

elif num > max_num:

max_num = num

return max_num - min_num

def max_diff3(nums):

return max(nums) - min(nums)

Pair

pollev.com/cse163

12 of 20

Timing

  • Generally two ways to time programs
    • Time the whole program
    • Use a profiler to help you see the time spent on each line
  • If you are timing the whole program, you need to run it multiple times to get a good idea of the total run-time
  • If you use a profiler, you usually don’t care about the raw times, but rather relative times

  • For profiling, I usually use the line_profiler package (kernprof)
  • In your terminal

12

conda install line_profiler

kernprof -v -l file.py

13 of 20

Why is Python Slow?

Interpreter

  • We have mentioned before that when we are using Python, we are using an interpreter rather than a compiler
  • This means it has to figure how to translate your code while it is running
  • This can be overly slow in β€œhot” loops

Dynamically types

  • We don’t have to write down the types ahead of time, which means Python spends a fair bit of time figuring out which type each object is

13

14 of 20

Example: Arithmetic

Consider this C program

This roughly maps to the machine instructions

14

/* C code */

int a = 1;

int b = 2;

int c = a + b;

Assign <int> 1 to a

Assign <int> 2 to b

call add<int, int>(a, b)

Assign <int> the result to c

15 of 20

Example: Arithmetic

Consider this Python program

This roughly maps to the machine instructions

15

# A Python program

a = 1

b = 2

c = a + b

16 of 20

Example: Arithmetic

16

1. Assign 1 to a

1a. Set a->PyObject_HEAD->typecode to integer

1b. Set a->val = 1

2. Assign 2 to b

2a. Set b->PyObject_HEAD->typecode to integer

2b. Set b->val = 2

3. call add(a, b)

3a. find typecode in a->PyObject_HEAD

3b. a is an integer; value is a->val

3c. find typecode in b->PyObject_HEAD

3d. b is an integer; value is b->val

3e. call add<int, int>(a->val, b->val)

3f. result of this is result, and is an integer.

4. Create a Python object c

4a. set c->PyObject_HEAD->typecode to integer

4b. set c->val to result

17 of 20

Slow Python

  • If Python is so slow, why does anyone use it?
    • It is WAY easier to program with than a language like C
      • Developer time costs $$$, computers are cheap
    • Python provides incredible support for plugging into pre-compiled libraries
      • Examples: pandas, sklearn, numpy, scipy
  • Pretty easy to get around this
    • Use a profiler to figure out where your bottlenecks are
    • If your program is slow, look to any β€œhot” loops that might be replaced with library calls.

17

18 of 20

Group Finding

18

19 of 20

19

20 of 20

Activity

TAs will give instructions

Share

  • Your name
  • A broad category you would want to investigate. Examples:
    • Gun Control
    • Smoking
    • City Traffic

Walk Around

  • Introduce yourself
  • Ask what you want to work on

20