CPython 3.13 Bytecode Optimization Proposal
Ken Jin & Jules Poon
NUS TEST Lab
Background
Proposed Pipeline
Tier 1 bytecode
Tier 2 bytecode + watchers
Jump target calculation and setup
Cleanup
Partial eval + Type prop
Jump relocation
Undecided:
Use-def/liveness analysis for dead store elimination. Might be too expensive to compute for a Tier 2 interpreter.
Tier 1 Bytecode & Conversion to Tier 2 + Watchers
For watchers:
Jump target calculation and setup
Problem: jumps and their targets will move after optimization passes, as code size shrinks.
Solution: Single pass to replace all jump instructions and their target instructions’ opcodes with negative index. Map the index to the original instruction in an array. This will allow us to restore them after we move them around during optimization passes.
Already done.
Partial Evaluation + Type prop
Inlining heuristics:
Cleanup
Jump relocation
The Abstract Interpretation Algorithm
Partial evaluation
The general idea:
Evaluating a program with respect to its “staticness” produces a residual program that should be faster.
Simplified version, adapted for CPython from https://pages.cs.wisc.edu/~horwitz/CS704-NOTES/9.PARTIAL-EVALUATION.html
Partial evaluation via abstract interpretation
The partitioning abstract interpreter
Partition view
l1 | l2 | l3 | l4 | l5 |
s5 |
s4 |
s3 |
s2 |
s1 |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 5
type: PyLong_Type
Locals:
Stack:
Node operations
OVERWRITE(src, dst): Make dst belong to src’s partition, and remove it from its old partition. There are some special cases that are handled by our algorithm and documented in the paper.
SET(src, dst): Union dst and src partition. The overall partition root node is src’s root node.
SWAP(src, dst): Swap src and dst between their respective partitions. This is just to handle SWAP in CPython bytecode.
A demo of abstract interpretation + partial evaluation
| | |
|
|
|
|
|
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
<Abstract locals and stack>
A demo of abstract interpretation
| | |
|
|
|
|
ptr |
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 1
A demo of abstract interpretation
ptr | | |
|
|
|
|
ptr |
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 1
A demo of abstract interpretation
ptr | | |
|
|
|
|
|
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 1
A demo of abstract interpretation
ptr | | |
|
|
|
|
ptr |
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 1
A demo of abstract interpretation
ptr | | |
|
|
|
ptr |
ptr |
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 1
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 2
A demo of abstract interpretation
ptr | | |
|
|
|
ptr |
ptr |
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 1
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 2
A demo of abstract interpretation
ptr | | |
|
|
|
|
ptr |
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 1
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 3
A demo of abstract interpretation
ptr | | |
|
|
|
|
ptr |
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 1
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 3
A demo of abstract interpretation
ptr | | |
|
|
|
|
ptr |
LOAD_CONST 1 |
STORE_FAST x |
LOAD_FAST x |
LOAD_CONST 2 |
ADD |
RETURN_VALUE |
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 1
Emitted:
LOAD_CONST 3
SAVE_IP
RETURN_VALUE
RootNode
Refcount: 1
static_or_dynamic: static
const_val: 3
Stack accessors
In short:
Stack accessors
DSL Changes