1 of 49

Python Concurrency

… and parallelism.

2 of 49

Who am I?

Hi! My name is Santiago Basulto…

From Argentina.
+10 years experience.
Co-founder RMOTR (acquired by INE.com, 2019).
I’m working on this: parallel

3 of 49

What we’ll see during this tutorial

Intro and why we need concurrency/parallelism
Computer Architecture
Understanding the role of the Operating System
Threads: Conceptually and with Python
The Python GIL 😱
Multiprocessing
The `concurrent.futures` module
An intro to `parallel`

4 of 49

What this tutorial is NOT about

`asyncio`, `trio`, `curio`, etc
Low level pthreads programming
Distributed architectures (job queues SQS, RabbitMQ, Celery, etc)
Pipelining, clustering and distributed computing (see Dask, Spark, etc)
GPU Parallelism (https://rapids.ai/)

5 of 49

Martelli Model of Scalability

1 core: Single thread and single process

2-8 cores: Multiple threads and multiple processes (THIS TALK)

9+ cores: Distributed processing

6 of 49

Why do we need to learn concurrent programming?

7 of 49

8 of 49

Computer Architecture

Back to the basics

9 of 49

The von Neumann architecture

10 of 49

Example of code accessing CPU, RAM or I/O

# Data stored in memory

x = 1

# Calculation done in CPU

x += 3

# I/O (write to a file)

with open('res.txt', 'w') as fp:

fp.write(f"The value is {x}")

# I/O (print to screen)

print(x)

11 of 49

Access time of different resources:

CPU Cycle	1 second
RAM Access	4 minutes
Generic SSD Access	1.5–4 days
Hard Drive access	1-9 months
Network Request SF -> NYC	5 years
Network Request SF -> Hong Kong	11 years

(Expressed in human relative times)

12 of 49

Access time of different resources:

13 of 49

The

Operating System

The guardian of the system

14 of 49

The OS is the one protecting hardware

x = 1

x += 3

with open('res.txt', 'w') as fp:

fp.write(f"The value is {x}")

print(x)

OS

15 of 49

The OS is the one protecting hardware

x = 1

x += 3

with open('res.txt', 'w') as fp:

fp.write(f"The value is {x}")

print(x)

OS

16 of 49

The Process

How computers run programs

17 of 49

What you write:

# Data

x = 1

# Calculation

x += 3

# I/O

with open('res.txt', 'w') as fp:

fp.write(f"The value is {x}")

# I/O

print(x)

18 of 49

What actually happens:

Your OS wraps your code in a special structure called Process.

A process holds the state of the execution of that given “instance” of the program running.

That’s why we can have several instances of the same program running at the same time.

x = 1

x += 3

with open('res.txt', 'w') as fp:

fp.write(f"The value is {x}")

print(x)

Your code:

Allocated RAM

x = 1 4

fp = <file open #1893>

Local variables:

And more...

The Process

19 of 49

We can run several instances of the same program

(at the same time)

20 of 49

Which results in multiple processes created:

21 of 49

Which results in multiple processes created:

Each process has a different Process ID.

22 of 49

Process Concurrency

Running multiple process, “at the same time”

23 of 49

Let’s start by assuming that we have only 1 CPU…

How many processes can we run

at the same time?

24 of 49

# 1 #

Just 1! A single CPU can take care of only 1 process at a time.

That’s the max.

25 of 49

But even with 1-CPU computers, it felt like there were multiple things happening at the same time.

How was it possible?

26 of 49

Time Slicing

Every OS includes a scheduler, which is in charge of administering CPU time for running processes.

27 of 49

The OS Scheduler

P1

P2

P3

28 of 49

This is what we call:

Concurrency

Several processes are “running” but they share the same CPU.

29 of 49

So, when can we talk about

Parallelism?

Only in multi-core systems, when there are two “things” actually running at the same time.

30 of 49

In a 2-core system

P1

CPU 1

CPU 2

CPU 1

P2

P3

CPU 2

2

CPU 1

2

CPU 2

31 of 49

In a 2-core system

P1

P2

P3

Parallelism

32 of 49

In a 2-core system

P1

P2

P3

CPU 1 is idle

33 of 49

Wait a minute..

How does the OS Scheduler decide which processes get CPU time?

34 of 49

Let’s look again at our first time slicing example.

35 of 49

There’s something strange with P2:

It just gets short bursts of CPU. But why?

P1

P2

P3

36 of 49

P2 is probably a process that we call:

I/O Bound

37 of 49

Remember? Access time of different resources:

CPU Cycle	1 second
RAM Access	4 minutes
Generic SSD Access	1.5–4 days
Hard Drive access	1-9 months
Network Request SF -> NYC	5 years
Network Request SF -> Hong Kong	11 years

I/O tasks

(they’re very slow)

38 of 49

When a process requests access to I/O, the OS will “unschedule” it, giving another process CPU time.

This is because I/O is slow. Instead of waiting idle until it finishes, it can do other stuff on the side.

39 of 49

Making our programs concurrent.

We need just one more concept in order to jump into Python concurrency.

“Intra-program” concurrency.

40 of 49

How can you make your own program concurrent?

Let’s say we’re dealing with an I/O bound program.

Example:

41 of 49

Our program starts by pulling data from 3 different websites.

Once it has fetched the data, it does some simple computation with all of it.

# program start

d1 = get_website_1() # 2 secs

d2 = get_website_2() # 2 secs

d3 = get_website_3() # 2 secs

combine(d1, d2, d3) # very fast

42 of 49

The problem is that, each network request blocks for 2 seconds.

The total time is >= 6 seconds.

There was a lot of idle time.

# program start

d1 = get_website_1() # 2 secs

d2 = get_website_2() # 2 secs

d3 = get_website_3() # 2 secs

combine(d1, d2, d3) # very fast

Total execution time >= 6 seconds!

43 of 49

A visual example: sequential execution

Start

Time

www1

www2

www3

+2 secs

Processing

(very fast)

Total

>= 6 secs

44 of 49

Multithreading

A mechanism to make our own programs concurrent (and potentially parallel).

45 of 49

A multithreaded example

Start

Time

Total

~ 2 secs

www1

www3

www2

Processing

(very fast)

+2 secs

46 of 49

How would this program look like, in code?

Something like this:

47 of 49

We’re now “spawning” the 3 network requests at the same time.

And blocking until all of them are done.

We then proceed with the calculations.

# program start

d1, d2, d3 = run_concurrently(

get_website_1,

get_website_2,

get_website_3) # 2 secs

combine(d1, d2, d3) # very fast

48 of 49

Python Threads

Python includes a built-in Threading library:

threading

49 of 49

Example: 1. Thread Basics.ipynb