1 of 43

Fall 2016

INFO I427:

Pythonic Thinking

2 of 43

To be Explicit

To Choose Simple over Complex

To Maximize Readability

PYTHONIC STYLE

3 of 43

>>import this

The Zen of Python,

by Tim Peters

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably only one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than *right* now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

4 of 43

WhiteSpace

  • Use spaces instead of tabs for indentation.
  • Use four spaces for each level of syntactically significant indenting.
  • In a file, functions and classes should be separated by two blank lines
  • Don’t put spaces around list indexes, function calls, or keyword argument assignments.
  • Put one—and only one—space before and after variable assignments.
  • ...

Naming

Expressions and statements

PEP 8

5 of 43

Naming

  • Functions, variables, and attributes should be in lowercase_underscore format.
  • Protected instance attributes should be in _leading_underscore format.
  • Private instance attributes should be in __double_leading_underscore format.
  • Classes and exceptions should be in CapitalizedWord format.
  • Module-level constants should be in ALL_CAPS format.
  • Instance methods in classes should use self as the name of the first parameter
  • Class methods should use cls as the name of the first parameter

Expressions and statements

PEP 8

6 of 43

Expressions and statements

  • Use if a is not b instead of if not a is b.
  • Use if not somelist to check for empty values instead of if len(somelist) == 0.
  • Avoid single-line if statements, for and while loops, and except compound statements. Spread these over multiple lines for clarity.
  • Always put import statements at the top of a file.
  • Always use absolute names for modules when importing them, not names relative to the current module’s own path. For example, to import the foo module from the bar package, you should do from bar import foo, not just import foo.
  • If you must do relative imports, use the explicit syntax from . import foo.
  • Imports should be in sections in the following order: standard library modules, third-party modules, your own modules. Each subsection should have imports in alphabetical order.

PEP 8

7 of 43

Pylint tool:

Pylint provides automated enforcement of the PEP 8 style guide and detects many other types of common errors in Python programs.

PEP 8

8 of 43

from urllib.parse import parse_qs

my_values = parse_qs(‘red=5&blue=0&green=’,

keep_blank_values=True)

print(repr(my_values))

>>>

{‘red’: [‘5’], ‘green’: [”], ‘blue’: [‘0’]}

WRITE A HELPER FUNCTION INSTEAD OF COMPLEX EXPERESSION

9 of 43

from urllib.parse import parse_qs

my_values = parse_qs(‘red=5&blue=0&green=’,

keep_blank_values=True)

print(repr(my_values))

>>>

{‘red’: [‘5’], ‘green’: [”], ‘blue’: [‘0’]}

WRITE A HELPER FUNCTION INSTEAD OF COMPLEX EXPERESSION

print(‘Red: ’, my_values.get(‘red’))

print(‘Green: ’, my_values.get(‘green’))

Red: [‘5’]

Green: [”]

10 of 43

from urllib.parse import parse_qs

my_values = parse_qs(‘red=5&blue=0&green=’,

keep_blank_values=True)

print(repr(my_values))

>>>

{‘red’: [‘5’], ‘green’: [”], ‘blue’: [‘0’]}

WRITE A HELPER FUNCTION INSTEAD OF COMPLEX EXPERESSION

print(‘Red: ’, my_values.get(‘red’))

print(‘Green: ’, my_values.get(‘green’))

Red: [‘5’]

Green: [”]

red = int(my_values.get(‘red’, [”])[0] or 0)

11 of 43

WRITE A HELPER FUNCTION INSTEAD OF COMPLEX EXPERESSION

print(‘Red: ’, my_values.get(‘red’))

print(‘Green: ’, my_values.get(‘green’))

Red: [‘5’]

Green: [”]

red = int(my_values.get(‘red’, [”])[0] or 0)

green = int(my_values.get(‘green’, [”])[0] or 0)

The trick here is that the empty string, the

empty list, and zero all evaluate to False implicitly.

12 of 43

red = int(my_values.get(‘red’, [”])[0] or 0)

green = int(my_values.get(‘green’, [”])[0] or 0)

WRITE A HELPER FUNCTION INSTEAD OF COMPLEX EXPERESSION

def get_first_int(values, key, default=0):

found = values.get(key, [”])

if found[0]:

found = int(found[0])

else:

found = default

return found

13 of 43

Slicing lets you access a subset of a sequence’s items with minimal effort.

a = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’]

  • First four: a[:4]
    • NO a[0:4]
  • Last four: a[-4:]
  • Middle two: a[3:-3]
  • Slicing to the end: A[4:]
    • NO a[5:len(a)]

Somelist[-0:] or somelist[:] will results in a copy of the original list.

HOW TO SLICE A SEQUENCE

14 of 43

Python provides compact syntax for deriving one list from another

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

squares = [x**2 for x in a]

squares = map(lambda x: x ** 2, a)

LIST COMPREHENSION instead of MAP and FILTER

15 of 43

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

squares = [x**2 for x in a]

squares = map(lambda x: x ** 2, a)

You only want to compute the squares of the numbers that are divisible by 2

even_squares = [x**2 for x in a if x % 2 == 0]

alt = map(lambda x: x**2, filter(lambda x: x % 2 == 0, a))

LIST COMPREHENSION instead of MAP and FILTER

16 of 43

The problem with list comprehension:

  • Works fine for small input but they consume a lot of memory for large input

value = [len(x) for x in open(‘/tmp/my_file.txt’)]

print(value)

>>>

[100, 57, 15, 1, 12, 75, 5, 86, 89, 11]

CONSIDER GENERATOR EXPRESSION FOR LARGE COMPREHENSIONS

17 of 43

The solution : Generator Expression

  • Generator expressions don’t materialize the whole output sequence when they’re run. Instead, generator expressions evaluate to an iterator that yields one item at a time from the expression.

it = (len(x) for x in open(‘/tmp/my_file.txt’))

print(it)>>>

<generator object <genexpr> at 0x101b81480>

print(next(it))

print(next(it))

>>>

100

57

CONSIDER GENERATOR EXPRESSION FOR LARGE COMPREHENSIONS

18 of 43

Generator Expressions can be composed together

it = (len(x) for x in open(‘/tmp/my_file.txt’))

roots = ((x, x**0.5) for x in it)

print(next(roots))

>>>

(15, 3.872983346207417)

CONSIDER GENERATOR EXPRESSION FOR LARGE COMPREHENSIONS

19 of 43

flavor_list = [‘vanilla’, ‘chocolate’, ‘pecan’, ‘strawberry’]

for i in range(len(flavor_list)):

flavor = flavor_list[i]

print(‘%d: %s’ % (i + 1, flavor))

PREFER ENUMERATE OVER RANGE

20 of 43

flavor_list = [‘vanilla’, ‘chocolate’, ‘pecan’, ‘strawberry’]

for i in range(len(flavor_list)):

flavor = flavor_list[i]

print(‘%d: %s’ % (i + 1, flavor))

enumerate wraps any iterator with a lazy generator. This generator yields pairs of the loop index and the next value from the iterator.

for i, flavor in enumerate(flavor_list):

print(‘%d: %s’ % (i + 1, flavor))

PREFER ENUMERATE OVER RANGE

21 of 43

for i, flavor in enumerate(flavor_list):

print(‘%d: %s’ % (i + 1, flavor))

for i, flavor in enumerate(flavor_list, 1):

print(‘%d: %s’ % (i, flavor))

PREFER ENUMERATE OVER RANGE

enumerate provides concise syntax for looping over an iterator and getting the

index of each item from the iterator as you go.

22 of 43

names = [‘Cecilia’, ‘Lise’, ‘Marie’]

letters = [len(n) for n in names]

longest_name = None

max_letters = 0

for i in range(len(names)):

count = letters[i]

if count > max_letters:

longest_name = names[i]

max_letters = count

print(longest_name)

>>>

Cecilia

USE ZIP TO PROCESS ITERATORS IN PARALLEL

23 of 43

for i, name in enumerate(names):

count = letters[i]

if count > max_letters:

longest_name = name

max_letters = count

USE ZIP TO PROCESS ITERATORS IN PARALLEL

24 of 43

In Python 3, zip wraps two or more iterators with a lazy generator. The zip generator yields tuples containing the next value from each iterator.

for name, count in zip(names, letters):

if count > max_letters:

longest_name = name

max_letters = count

The zip_longest function from the itertools built-in module lets you iterate

over multiple iterators in parallel regardless of their lengths

USE ZIP TO PROCESS ITERATORS IN PARALLEL

25 of 43

FUNCTIONS

26 of 43

  • Returning none from a function is error prone

def divide(a, b):

Try:

return a / b

except ZeroDivisionError:

return None

x, y = 0, 5

result = divide(x, y)

if not result:

print(‘Invalid inputs’) # This is wrong!

PREFER EXCEPTION to RETURN NONE

27 of 43

Solution: def divide(a, b):

try:

return a / b

except ZeroDivisionError as e:

raise ValueError(‘Invalid inputs’) from e

x, y = 5, 2

try:

result = divide(x, y)

except ValueError:

print(‘Invalid inputs’)

else:

print(‘Result is %.1f’ % result)

>>>

Result is 2.5

PREFER EXCEPTION to RETURN NONE

28 of 43

  • How can we sort a list of numbers but prioritize one group of numbers to come first?

numbers = [8, 3, 1, 2, 5, 4, 7, 6]

group = {2, 3, 5, 7}

sort_priority(numbers, group)

print(numbers)

>>>[2, 3, 5, 7, 1, 4, 6, 8]

KNOW HOW CLOSURES INTERACT WITH VARIABLE SCOPE

29 of 43

def sort_priority(values, group):

def helper(x):

if x in group:

return (0, x)

return (1, x)

values.sort(key=helper)

numbers = [8, 3, 1, 2, 5, 4, 7, 6]

group = {2, 3, 5, 7}

sort_priority(numbers, group)

print(numbers)

>>>[2, 3, 5, 7, 1, 4, 6, 8]

KNOW HOW CLOSURES INTERACT WITH VARIABLE SCOPE

30 of 43

def sort_priority(values, group):

def helper(x):

if x in group:

return (0, x)

return (1, x)

values.sort(key=helper)

numbers = [8, 3, 1, 2, 5, 4, 7, 6]

group = {2, 3, 5, 7}

sort_priority(numbers, group)

print(numbers)

>>>[2, 3, 5, 7, 1, 4, 6, 8]

KNOW HOW CLOSURES INTERACT WITH VARIABLE SCOPE

Python supports Closures:

functions that refer to variables from the scope in which they were defined.

This is why the helper function is able to access the group argument to sort_priority.

31 of 43

def sort_priority(values, group):

def helper(x):

if x in group:

return (0, x)

return (1, x)

values.sort(key=helper)

numbers = [8, 3, 1, 2, 5, 4, 7, 6]

group = {2, 3, 5, 7}

sort_priority(numbers, group)

print(numbers)

>>>[2, 3, 5, 7, 1, 4, 6, 8]

KNOW HOW CLOSURES INTERACT WITH VARIABLE SCOPE

Functions are first-class objects in Python:

You can refer to them directly,

assign them to variables, pass them as arguments to other functions, compare them

in expressions and if statements, etc.

This is how the sort method can accept a

closure function as the key argument.

32 of 43

def sort_priority(values, group):

def helper(x):

if x in group:

return (0, x)

return (1, x)

values.sort(key=helper)

numbers = [8, 3, 1, 2, 5, 4, 7, 6]

group = {2, 3, 5, 7}

sort_priority(numbers, group)

print(numbers)

>>>[2, 3, 5, 7, 1, 4, 6, 8]

KNOW HOW CLOSURES INTERACT WITH VARIABLE SCOPE

Python has specific rules for comparing tuples:

It first compares items in index zero,

then index one, then index two, and so on.

This is why the return value from the

helper closure causes the sort order to have two distinct groups.

33 of 43

you want to find the index of every word in a string

def index_words(text):

result = []

if text:

result.append(0)

for index, letter in enumerate(text):

if letter == ‘ ‘:

result.append(index + 1)

return result

address = ‘Four score and seven years ago...’

result = index_words(address)

print(result[:3])

>>>

[0, 5, 11]

CONSIDER GENERATORS INSTEAD OF RETURNING LIST

34 of 43

you want to find the index of every word in a string

def index_words(text):

result = []

if text:

result.append(0)

for index, letter in enumerate(text):

if letter == ‘ ‘:

result.append(index + 1)

return result

address = ‘Four score and seven years ago...’

result = index_words(address)

print(result[:3])

>>>

[0, 5, 11]

CONSIDER GENERATORS INSTEAD OF RETURNING LIST

35 of 43

Generators are functions that use yield expressions.

When called, generator functions do not actually run but instead immediately return an iterator.

With each call to the next built-in function, the iterator will advance the generator to its next yield expression.

Each value passed to yield by

the generator will be returned by the iterator to the caller.

CONSIDER GENERATORS INSTEAD OF RETURNING LIST

36 of 43

def index_words_iter(text):

if text:

yield 0

for index, letter in enumerate(text):

if letter == ‘ ‘:

yield index + 1

result = list(index_words_iter(address))

CONSIDER GENERATORS INSTEAD OF RETURNING LIST

37 of 43

def index_file(handle):

offset = 0

for line in handle:

if line:

yield offset

for letter in line:

offset += 1

if letter == ‘ ‘:

yield offset

CONSIDER GENERATORS INSTEAD OF RETURNING LIST

with open(‘/tmp/address.txt’, ‘r’) as f:

it = index_file(f)

results = islice(it, 0, 3)

print(list(results))

>>>

[0, 5, 11]

38 of 43

def normalize(numbers):

total = sum(numbers)

result = []

for value in numbers:

percent = 100 * value / total

result.append(percent)

return result

visits = [15, 35, 80]

percentages = normalize(visits)

print(percentages)

>>>

[11.538461538461538, 26.923076923076923, 61.53846153846154]

BE DEFENCIVE WHEN ITERATING OVER ARGUMENTS

39 of 43

def read_visits(data_path):

with open(data_path) as f:

for line in f:

yield int(line)

it = read_visits(‘/tmp/my_numbers.txt’)

percentages = normalize(it)

print(percentages)

>>>

[]

BE DEFENCIVE WHEN ITERATING OVER ARGUMENTS

def normalize(numbers):

total = sum(numbers)

result = []

for value in numbers:

percent = 100 * value / total

result.append(percent)

return result

40 of 43

The cause of this behavior is that an iterator only produces its results a single time. If you iterate over an iterator or generator that has already raised a StopIteration exception, you won’t get any results the second time around.

it = read_visits(‘/tmp/my_numbers.txt’)

print(list(it))

print(list(it)) # Already exhausted

>>>

[15, 35, 80]

[]

BE DEFENCIVE WHEN ITERATING OVER ARGUMENTS

41 of 43

What’s confusing is that you also won’t get any errors when you iterate over an already exhausted iterator. for loops, the list constructor, and many other functions throughout the Python standard library expect the StopIteration exception to be raised during normal operation. These functions can’t tell the difference between an iterator that has no output and an iterator that had output and is now exhausted.

BE DEFENCIVE WHEN ITERATING OVER ARGUMENTS

42 of 43

def normalize_copy(numbers):

numbers = list(numbers) # Copy the iterator

total = sum(numbers)

result = []

for value in numbers:

percent = 100 * value / total

result.append(percent)

return result

it = read_visits(‘/tmp/my_numbers.txt’)

percentages = normalize_copy(it)

print(percentages)

>>>[11.538461538461538, 26.923076923076923, 61.53846153846154]

BE DEFENCIVE WHEN ITERATING OVER ARGUMENTS

43 of 43

CAN WE DO BETTER?

BE DEFENCIVE WHEN ITERATING OVER ARGUMENTS