1 of 34

Reaching a Per-Interpreter GIL

Eric Snow

Python Language Summit

27 April 2022

https://bit.ly/37LMQ6K

2 of 34

  1. Multiple Interpreters in a Process
  2. Interpreter Isolation and Global State
  3. A Per-Interpreter GIL (and Other Motivations)
  4. Biggest Obstacles

https://bit.ly/37LMQ6K

3 of 34

  • Multiple Interpreters in a Process
  • Interpreter Isolation and Global State
  • A Per-Interpreter GIL (and Other Motivations)
  • Biggest Obstacles

PEP 684, "A Per-Interpreter GIL"

PEP 554, "Multiple Interpreters in the Stdlib"

https://bit.ly/37LMQ6K

4 of 34

  • Multiple Interpreters in a Process
  • Interpreter Isolation and Global State
  • A Per-Interpreter GIL (and Other Motivations)
  • Biggest Obstacles

PEP 684, "A Per-Interpreter GIL"

PEP 554, "Multiple Interpreters in the Stdlib"

https://bit.ly/37LMQ6K

5 of 34

Multiple Interpreters in a Process

  • PyInterpreterState & PyThreadState added in 1997 (Python 1.5)
  • PyInterpreterState enabled multiple interpreters in a process
  • growing in popularity
  • isolated interpreters

https://bit.ly/37LMQ6K

6 of 34

commit a027efa5bfa7911b5c4b522b6a0698749a6f2e4a

Author: Guido van Rossum <guido@python.org>

Date: Mon May 5 20:56:21 1997 +0000

Massive changes for separate thread state management.

All per-thread globals are moved into a struct which is manipulated

separately.

https://bit.ly/37LMQ6K

7 of 34

Multiple Interpreters in a Process

  • PyInterpreterState & PyThreadState added in 1997 (Python 1.5)
  • PyInterpreterState enabled multiple interpreters in a process
  • growing in popularity
  • isolated interpreters

https://bit.ly/37LMQ6K

8 of 34

Multiple Interpreters in a Process

  • PyInterpreterState & PyThreadState added in 1997 (Python 1.5)
  • PyInterpreterState enabled multiple interpreters in a process
  • growing in popularity
  • isolated interpreters

https://bit.ly/37LMQ6K

9 of 34

Multiple Interpreters in a Process

  • PyInterpreterState & PyThreadState added in 1997 (Python 1.5)
  • PyInterpreterState enabled multiple interpreters in a process
  • growing in popularity
  • isolated interpreters

https://bit.ly/37LMQ6K

10 of 34

Consolidating Runtime-Global State

  • historically, many global variables held CPython runtime state
  • (2017) PyRuntimeState added to encapsulate that global runtime state
  • long-running effort to move all runtime state out of globals
  • overlaps with effort to improve runtime init/fini (432, 587)
  • Tools/c-analyzer/check-c-globals.py - helps us avoid adding more globals

https://bit.ly/37LMQ6K

11 of 34

Consolidating Runtime-Global State

  • historically, many global variables held CPython runtime state
  • (2017) PyRuntimeState added to encapsulate that global runtime state
  • long-running effort to move all runtime state out of globals
  • overlaps with effort to improve runtime init/fini (432, 587)
  • Tools/c-analyzer/check-c-globals.py - helps us avoid adding more globals

https://bit.ly/37LMQ6K

12 of 34

commit 76d5abc8684bac4f2fc7cccfe2cd940923357351

Author: Eric Snow <ericsnowcurrently@gmail.com>

Date: Tue Sep 5 18:26:16 2017 -0700

bpo-30860: Consolidate stateful runtime globals. (#2594)

* group the (stateful) runtime globals into various topical structs

* consolidate the topical structs under a single top-level _PyRuntimeState struct

* add a check-c-globals.py script that helps identify runtime globals

Other globals are excluded (see globals.txt and check-c-globals.py).

https://bit.ly/37LMQ6K

13 of 34

Consolidating Runtime-Global State

  • historically, many global variables held CPython runtime state
  • (2017) PyRuntimeState added to encapsulate that global runtime state
  • long-running effort to move all runtime state out of globals
  • overlaps with effort to improve runtime init/fini (432, 587)
  • Tools/c-analyzer/check-c-globals.py - helps us avoid adding more globals

https://bit.ly/37LMQ6K

14 of 34

Consolidating Runtime-Global State

  • historically, many global variables held CPython runtime state
  • (2017) PyRuntimeState added to encapsulate that global runtime state
  • long-running effort to move all runtime state out of globals
  • overlaps with effort to improve runtime init/fini (432, 587)
  • Tools/c-analyzer/check-c-globals.py - helps us avoid adding more globals

https://bit.ly/37LMQ6K

15 of 34

Consolidating Runtime-Global State

  • historically, many global variables held CPython runtime state
  • (2017) PyRuntimeState added to encapsulate that global runtime state
  • long-running effort to move all runtime state out of globals
  • overlaps with effort to improve runtime init/fini (432, 587)
  • Tools/c-analyzer/check-c-globals.py - helps us avoid adding more globals

https://bit.ly/37LMQ6K

16 of 34

Consolidating Runtime-Global State

  • ~1220 total global variables remaining
    • core
      • ~250 objects exposed by C-API (static types, exception types, singletons)
      • ~100 other objects
      • ~100 non-objects
    • builtin modules
      • ~150 objects
      • ~20 non-objects
    • extension modules
      • ~550 objects
      • ~50 non-objects
  • including:
    • ~225 _PyArg_Parser
    • ~170 _Py_IDENTIFIER
    • ~380 static types
    • ~90 modules

https://bit.ly/37LMQ6K

17 of 34

Interpreter Isolation

  • only mostly isolated from each other
  • most global state will eventually move to PyInterpreterState
  • related effort to isolate modules per-interpreter (384, 3121, 489, 573, 630, 687)

https://bit.ly/37LMQ6K

18 of 34

Interpreter Isolation

  • only mostly isolated from each other
  • most global state will eventually move to PyInterpreterState
  • related effort to isolate modules per-interpreter (384, 3121, 489, 573, 630, 687)

https://bit.ly/37LMQ6K

19 of 34

A Per-Interpreter GIL

  • allows true multi-core parallelism for code running in different interpreters
  • requires maximum interpreter isolation, especially objects
  • extensions must opt in to loading outside the main interpreter

https://bit.ly/37LMQ6K

20 of 34

A Per-Interpreter GIL

  • allows true multi-core parallelism for code running in different interpreters
  • requires maximum interpreter isolation, especially objects
  • extensions must opt in to loading outside the main interpreter

https://bit.ly/37LMQ6K

21 of 34

A Per-Interpreter GIL

  • allows true multi-core parallelism for code running in different interpreters
  • requires maximum interpreter isolation, especially objects
  • extensions must opt in to availability outside the main interpreter

https://bit.ly/37LMQ6K

22 of 34

Reaching a Per-Interpreter GIL

  1. consolidate all the mutable global state (one piece at a time)
  2. move nearly all of it down to PyInterpreterState (mostly all-at-once)
  3. deal with incompatible extension modules (opt-in with module def slots)
  4. move the GIL

https://bit.ly/37LMQ6K

23 of 34

Biggest Obstacles

  • what to do about the allocators?
  • impact on extension module maintainers
  • dealing with objects exposed by the C-API (solved with immortal objects)

https://bit.ly/37LMQ6K

24 of 34

Obstacle: Memory Allocators

  • globals in Objects/obmalloc.c (raw, mem, object, debug)
  • make per-interpreter?
  • keep global?
    • mimalloc?
    • implied promise of protection by GIL

https://bit.ly/37LMQ6K

25 of 34

Obstacle: Memory Allocators

  • globals in Objects/obmalloc.c (raw, mem, object, debug)
  • make per-interpreter?
  • keep global?
    • mimalloc?
    • implied promise of protection by GIL

https://bit.ly/37LMQ6K

26 of 34

Obstacle: Memory Allocators

  • globals in Objects/obmalloc.c (raw, mem, object, debug)
  • make per-interpreter?
  • keep global?
    • mimalloc?
    • implied promise of protection by GIL

https://bit.ly/37LMQ6K

27 of 34

Obstacle: Burden(?) on Extension Module Maintainers

  • many extensions store state in global variables
  • extensions must opt-in to supporting multiple interpreters, after fixing state
  • for some large extensions this will be a lot of work
  • users will keep asking for multi-interpreter support
  • how to help such extensions to add support?
    • how many?
    • if few enough, work with them directly?

https://bit.ly/37LMQ6K

28 of 34

Obstacle: Objects Exposed by the C-API

  • ~250 objects exposed by C-API (static types, exception types, singletons)
  • challenge: they are exposed to stable ABI extensions
  • share some "immutable" objects?
    • "immortal objects" solves this (but PEP 683 is not guaranteed)
  • make *all* objects per-interpreter?
    • lookup functions + macro hackery
    • restrict stable ABI extensions to the main interpreter?

https://bit.ly/37LMQ6K

29 of 34

Biggest Obstacles

  • allocators
    • keep global (mimalloc?) vs. per-interpreter
    • custom allocators (promise of thread-safety)
  • impact on extension module maintainers
    • how many affected? help directly?
  • objects exposed by the C-API
    • share some "immutable" objects (immortal) vs. making *all* objects per-interpreter

https://bit.ly/37LMQ6K

30 of 34

Discussion!

https://bit.ly/37LMQ6K

31 of 34

Biggest Obstacles

  • allocators
    • keep global (mimalloc?) vs. per-interpreter
    • custom allocators (promise of thread-safety)
  • impact on extension module maintainers
    • how many affected? help directly?
  • objects exposed by the C-API
    • share some "immutable" objects (immortal) vs. making *all* objects per-interpreter

"Reaching a Per-Interpreter GIL" - Eric Snow - Language Summit - 27 April 2022

https://bit.ly/37LMQ6K

32 of 34

https://bit.ly/37LMQ6K

33 of 34

For the Python community, CPython's multi-core story has been murky, at best, due to the GIL. Since 2014, I've been slowly working on a solution that centers on no longer sharing the GIL between multiple interpreters in the same process. That goal is within reach for the 3.12 release.

In this talk we will look at where things are at, focusing on the biggest remaining obstacles. This includes things like the impact of stable ABI extensions (especially older ones), what to do about the allocators, and dealing with objects exposed by the C-API. Your expertise and feedback will be invaluable as we work to find the best solutions.

https://bit.ly/37LMQ6K

34 of 34

My multi-core Python project is getting close to completion (of the first phase). There are a number of technical challenges that, while not major, would benefit from feedback from the group. This includes:

  • immortal objects (PEP 683)
    • why they are so helpful for interpreter isolation
    • how realistic is the remaining blocker (possible crash with older 32-bit stable ABI extensions)?
    • how to deal with it?
  • per-interpreter GIL (PEP 684)
    • are we okay with getting rid of (almost) all mutable static variables?
    • are we okay with using tooling (Tools/c-analyzer) to prevent the addition of new globals?
    • how to deal with the allocators (e.g. per-interpreter, mimalloc)?
    • sharing some global objects eliminates a number of obstacles (objects exposed in C-API, static types, etc.) – is that better (enough) than making *all* objects per-interpreter?
    • is our solution for the per-interpreter part of static types okay?
    • how to mitigate the impact on maintainers of large extension modules (e.g. numpy)?
  • exposing multiple-interpreters to Python code (PEP 554)
    • is the proposed approach (minimal API) good enough for now?
    • should the relationship with async be developed sooner rather than later?

That's a lot for 10+20 minutes. Realistically, the presentation will be more like this:

  1. (3 min) a short status report on the overall project, including the 3 PEPs
  2. (6 min) a detailed-ish explanation of the biggest challenges and open questions
  3. (1 min) a prioritized list of the things on which I'd like feedback (selected from above)
  4. (20 min) discussion driven by that list

https://bit.ly/37LMQ6K