Deoptimization in V8
Deoptimization in V8
Intro - how V8 optimizes
Generally, V8 tries to optimize code for cases seen in the past.
Past cases gathered in type feedback vector (by the interpreter).
Optimized code has dynamic checks to validate assumptions from feedback.
If a dynamic check fails the deoptimizer deoptimizes the code:
Example
function f(o) {
return o.x;
}
Example
function f(o) {
return o.x;
}
f({x : 1}); f({x : 2});
Bytecode
0 : StackCheck
1 : Nop
2 : LdaNamedProperty a0, [0], [2]
6 : Return
First, we run unoptimized code, gathering feedback.
Feedback
...
2: { x : Smi }
Example
function f(o) {
return o.x;
}
f({x : 1}); f({x : 2});
%OptimizeFunctionOnNextCall(f);
f({x : 3});
Bytecode
0 : StackCheck
1 : Nop
2 : LdaNamedProperty a0, [0], [2]
6 : Return
First, we run unoptimized code, gathering feedback.
When hot, we decide to optimize based on feedback. (Simulated here by the runtime call.)
Feedback
...
2: { x : Smi }
Example
function f(o) {
return o.x;
}
f({x : 1}); f({x : 2});
%OptimizeFunctionOnNextCall(f);
f({x : 3});
Optimized code
...
movq rax,[rbp+0x10]
test al,0x1
jz 85
movq rbx,0x2eb96db8c391
cmpq [rax-0x1],rbx
jnz 90
movq rax,[rax+0x17]
movq rsp,rbp
pop rbp
ret 0x10
…
85: call 0xbee9d404000 ;; deopt 0
90: call 0xbee9d40400a ;; deopt 1
Bytecode
0 : StackCheck
1 : Nop
2 : LdaNamedProperty a0, [0], [2]
6 : Return
First, we run unoptimized code, gathering feedback.
When hot, we decide to optimize based on feedback.
Optimized code explained:
Feedback
...
2: { x : Smi }
Example
function f(o) {
return o.x;
}
f({x : 1}); f({x : 2});
%OptimizeFunctionOnNextCall(f);
f({x : 3});
f({y : 1, x : 3});
Optimized code
...
movq rax,[rbp+0x10]
test al,0x1
jz 85
movq rbx,0x2eb96db8c391
cmpq [rax-0x1],rbx
jnz 90
movq rax,[rax+0x17]
movq rsp,rbp
pop rbp
ret 0x10
…
85: call 0xbee9d404000 ;; deopt 0
90: call 0xbee9d40400a ;; deopt 1
Bytecode
0 : StackCheck
1 : Nop
2 : LdaNamedProperty a0, [0], [2]
6 : Return
What happens if the map check fails?
Feedback
...
2: { x : Smi }
Example
function f(o) {
return o.x;
}
f({x : 1}); f({x : 2});
%OptimizeFunctionOnNextCall(f);
f({x : 3});
f({y : 1, x : 3});
Optimized code
...
movq rax,[rbp+0x10]
test al,0x1
jz 85
movq rbx,0x2eb96db8c391
cmpq [rax-0x1],rbx
jnz 90
movq rax,[rax+0x17]
movq rsp,rbp
pop rbp
ret 0x10
…
85: call 0xbee9d404000 ;; deopt 0
90: call 0xbee9d40400a ;; deopt 1
Bytecode
0 : StackCheck
1 : Nop
2 : LdaNamedProperty a0, [0], [2]
6 : Return
What happens if the map check fails?
The deoptimizer is called…
This is called eager deoptimization because the code deoptimizes immediately.
Feedback
...
2: { x : Smi }
What does the deoptimizer do?
Example
function f(o) {
return o.x;
}
f({x : 1}); f({x : 2});
%OptimizeFunctionOnNextCall(f);
f({y : 1, x : 3});
d8 --print-opt-code --print-code-verbose --code-comments
…
90: call 0xbee9d40400a ;; deopt 1
Deoptimization Input Data
index ast id commands
...
1 0 BEGIN: #frames: 0
INTERPRETED_FRAME: fun f, offset 0
STACK_SLOT: 3
STACK_SLOT: -2
REGISTER : rax
STACK_SLOT: 2
LITERAL <optimized_out>
...
Bytecode
0 : StackCheck
1 : Nop
2 : LdaNamedProperty a0, [0], [2]
6 : Return
Printing deoptimization translations associated with optimized code:
d8 --print-opt-code \
--print-code-verbose
V8 will print the deoptimization table for a function when the function compilation finishes.
Example
function f(o) {
return o.x;
}
f({x : 1}); f({x : 2});
%OptimizeFunctionOnNextCall(f);
f({y : 1, x : 3});
d8 --print-opt-code --print-code-verbose --code-comments
…
90: call 0xbee9d40400a ;; deopt 1
Deoptimization Input Data
index ast id commands
...
1 0 BEGIN: #frames: 0
INTERPRETED_FRAME: fun f, offset 0
STACK_SLOT: 3 Function
STACK_SLOT: -2 Receiver
REGISTER : rax Arg #0
STACK_SLOT: 2 Context
LITERAL <optimized_out> Accumulator
...
Bytecode
0 : StackCheck
1 : Nop
2 : LdaNamedProperty a0, [0], [2]
6 : Return
Printing deoptimization translations associated with optimized code:
d8 --print-opt-code \
--print-code-verbose
V8 will print the deoptimization table for a function when the function compilation finishes.
The table is a bit cryptic.
Example
Deoptimization Input Data (for function f) --print-opt-code
index ast id commands --print-code-verbose
1 0 BEGIN: #frames: 0
INTERPRETED_FRAME: fun f, offset 0
STACK_SLOT: 3 Function
STACK_SLOT: -2 Receiver
REGISTER : rax Arg #0
STACK_SLOT: 2 Context
LITERAL <optimized_out> Accumulator
Printing deoptimizations at runtime:
d8 --trace-deopt
V8 will print the deoptimization trace when deoptimizing.
[deoptimizing (eager): begin <f> @1, ...] --trace-deopt
reading input frame f => bytecode_offset=0, args=2, inputs:
0: 0x319ceb12a529 ; [fp - 16] 0x319ceb12a529 <f>
1: 0x3a4973803239 ; [fp + 24] 0x3a4973803239 <Global obj>
2: 0x3a49738090a1 ; rax 0x3a49738090a1 <an Object ...>
3: 0x319ceb103bf9 ; [fp - 8] 0x... <FixedArray[244]>
4: 0x4069be84c21 ; (literal 1) 0x... <optimized_out>
translating interpreted frame f => bytecode_offset=0, height=8
0x7fff5928da88: [top + 72] <- 0x... ; <Global obj.> (input #1)
0x7fff5928da80: [top + 64] <- 0x... ; <Object ...> (input #2)
-------------------------
0x7fff5928da78: [top + 56] <- 0x18ee3425b89e ; caller's pc
0x7fff5928da70: [top + 48] <- 0x7fff5928daf8 ; caller's fp
0x7fff5928da68: [top + 40] <- 0x... ; context (input #3)
0x7fff5928da60: [top + 32] <- 0x... ; fun <f> (input #0)
0x7fff5928da58: [top + 24] <- 0x... ; new_target (input #0)
0x7fff5928da50: [top + 16] <- 0x... ; bytecode array (input #0)
0x7fff5928da48: [top + 8] <- 0x35...; bc offset 53 (input #0)
-------------------------
0x7fff5928da40: [top + 0] <- 0x...; accumulator <optimized_out> (input #4)
Compilation time
Run/deopt time
Example
Deoptimization Input Data --print-opt-code
index ast id commands --print-code-verbose
1 0 BEGIN: #frames: 0
INTERPRETED_FRAME: fun f, offset 0
STACK_SLOT: 3 Function
STACK_SLOT: -2 Receiver
REGISTER : rax Arg #0
STACK_SLOT: 2 Context
LITERAL <optimized_out> Accumulator
Printing deoptimizations at runtime:
d8 --trace-deopt
V8 will print the deoptimization trace when deoptimizing.
[deoptimizing (eager): begin <f> @1, ...] --trace-deopt
reading input frame f => bytecode_offset=0, args=2, inputs:
0: 0x319ceb12a529 ; [fp - 16] 0x319ceb12a529 <f>
1: 0x3a4973803239 ; [fp + 24] 0x3a4973803239 <Global obj>
2: 0x3a49738090a1 ; rax 0x3a49738090a1 <an Object ...>
3: 0x319ceb103bf9 ; [fp - 8] 0x... <FixedArray[244]>
4: 0x4069be84c21 ; (literal 1) 0x... <optimized_out>
translating interpreted frame f => bytecode_offset=0, height=8
0x7fff5928da88: [top + 72] <- 0x... ; <Global obj.> (input #1)
0x7fff5928da80: [top + 64] <- 0x... ; <Object ...> (input #2)
-------------------------
0x7fff5928da78: [top + 56] <- 0x18ee3425b89e ; caller's pc
0x7fff5928da70: [top + 48] <- 0x7fff5928daf8 ; caller's fp
0x7fff5928da68: [top + 40] <- 0x... ; context (input #3)
0x7fff5928da60: [top + 32] <- 0x... ; fun <f> (input #0)
0x7fff5928da58: [top + 24] <- 0x... ; new_target (input #0)
0x7fff5928da50: [top + 16] <- 0x... ; bytecode array (input #0)
0x7fff5928da48: [top + 8] <- 0x35...; bc offset 53 (input #0)
-------------------------
0x7fff5928da40: [top + 0] <- 0x...; accumulator <optimized_out> (input #4)
Compilation time
Run/deopt time
Lazy deoptimization motivating example
;; Global scope
var o = { x : 1 };
function f() {
return o.x;
}
f();
… Prologue and stack check …
movq rax,0x3415e188941 ;; Get o
movq rax,[rax+0x17] ;; Load o.x
movq rsp,rbp
pop rbp
ret 0x8
No check that o is still global_obj.o, no map check for o!
Instead, we register optimized code for f to be thrown away whenever the cell for o changes.
But what if f is already being executed when changing o?
Lazy deoptimization motivating example
function change_o() {
o = { y : 0, x : 1 };
}
var o = { x : 1 };
function f() {
change_o();
return o.x;
}
f();
…
48 call [rdi+0x37] ; Call change_o
;; Assumption violated, cannot continue!
51 movq rax,0x3415e188941 ;; Get o
61 movq rax,[rax+0x17] ;; Load o.x
65 movq rsp,rbp
68 pop rbp
69 ret 0x8
Lazy deoptimization motivating example
function change_o() {
o = { y : 0, x : 1 };
}
var o = { x : 1 };
function f() {
change_o();
return o.x;
}
f();
…
48 call [rdi+0x37] ; Call change_o
;; Lazy deopt point here.
51 movq rax,0x3415e188941 ;; Get o
61 movq rax,[rax+0x17] ;; Load o.x
65 movq rsp,rbp
68 pop rbp
69 ret 0x8
At the moment we patch all positions after calls with a call to the deoptimizer!
Example
function change_o() {
o = { y : 0, x : 1 };
}
var o = { x : 1 };
function f() {
change_o();
return o.x;
}
f();
…
48 call [rdi+0x37] ; Call change_o
;; Lazy deopt point here.
51 movq rax,0x3415e188941 ;; Get o
61 movq rax,[rax+0x17] ;; Load o.x
65 movq rsp,rbp
68 pop rbp
69 ret 0x8
index ast id pc commands
0 9 51 BEGIN
INTERPRETED_FRAME <f>, offset 9
STACK_SLOT: 3
STACK_SLOT: -1
STACK_SLOT: 2
LITERAL <optimized_out>
LITERAL <optimized_out>
REGISTER : rax
Example
function change_o() {
o = { y : 0, x : 1 };
}
var o = { x : 1 };
function f() {
change_o();
return o.x;
}
f();
…
48 call [rdi+0x37] ; Call change_o
;; Lazy deopt point here.
51 call 0x18ee3400400a ;; Patched with a call to
;; the deoptimizer!
;; Potentially garbage here...
index ast id pc commands
0 9 51 BEGIN
INTERPRETED_FRAME <f>, offset 9
STACK_SLOT: 3
STACK_SLOT: -1
STACK_SLOT: 2
LITERAL <optimized_out>
LITERAL <optimized_out>
REGISTER : rax
The deoptimizer will first patch all positions after calls with call to the deoptimizer.
Later, when execution resumes at function f, it immediately deoptimizes.
Trace for lazy deoptimization (--trace-deopt)
[marking dependent code 0x... for deoptimization, reason: property-cell-changed]
Assumption violated
Search for dependent code
Trace for lazy deoptimization (--trace-deopt)
[marking dependent code 0x... for deoptimization, reason: property-cell-changed]
[deoptimize marked code in all contexts]
[deoptimizer unlinked: f / 3b518b4aa901]
[deoptimizer found activation of function: f / 3b518b4aa901]
Assumption violated
Search for dependent code
Patch for lazy deopt
Trace for lazy deoptimization (--trace-deopt)
[marking dependent code 0x... for deoptimization, reason: property-cell-changed]
[deoptimize marked code in all contexts]
[deoptimizer unlinked: f / 3b518b4aa901]
[deoptimizer found activation of function: f / 3b518b4aa901]
[deoptimizing (DEOPT lazy): begin <f>]
reading input frame f => bytecode_offset=9, args=1, height=3; inputs:
0: 0x3b518b4aa901 ; [fp - 16] 0x... <f>
1: 0xf3267683239 ; [fp + 16] 0xf3267683239 <JS Global Object>
2: 0x3b518b483bf9 ; [fp - 8] 0x3b518b483bf9 <FixedArray[252]>
3: 0x473fda84989 ; (literal 1) 0x473fda84989 <Odd Oddball: optimized_out>
4: 0x473fda84989 ; (literal 1) 0x473fda84989 <Odd Oddball: optimized_out>
5: 0x473fda82311 ; rax 0x473fda82311 <undefined>
translating interpreted frame f => bytecode_offset=9, height=24
...
[deoptimizing (lazy): end 0x3b518b4aa901 <f> ... took 0.510 ms]
Assumption violated
Search for dependent code
Patch for lazy deopt
Actual lazy deoptimization after execution resumes at f and hits the patch point.
Deoptimization in the Turbofan compiler
We have separate mechanisms for eager and lazy deoptimization.
Eager deoptimization:
Lazy deoptimization:
Lots of deoptimization-related nodes when building graph.
Example graph
JSLoadNamed
Checkpoint
Framestate (BC# 8)
Framestate (Push) (BC# 9)
StateValues
StateValues
StateValues
effect
Parameter #0
Parameter #1
JSAdd
Parameter #1
JSLoadNamed can deoptimize both lazily and eagerly.
Property load:
Semantics of eager deoptimization node
Every potentially deoptimizing node must be in the effect chain.
If a node deoptimizes, it will deoptimize to the state in the last checkpoint in the effect chain.
… or any previous state if there is no observable side-effect between.
Nodes do not have to refer to the (eager) deoptimization state explicitly.
Lazy deoptimization is very much explicit!
Eliminating checkpoints
We eliminate a redundant checkpoint if there is there is a previous checkpoint in the effect chain with no (obervable) side-effect in between.
The previous checkpoint must be reachable without effect merges.
Checkpoint
Checkpoint
LoadField
EffectPhi
Checkpoint
CheckMaps
effect
effect
Framestate
Framestate
Framestate
Eliminating checkpoints
We eliminate a redundant checkpoint if there is there is a previous checkpoint in the effect chain with no (obervable) side-effect in between.
The previous checkpoint must be reachable without effect merges.
Checkpoint
LoadField
CheckMaps
Checkpoint
EffectPhi
effect
effect
Framestate
Framestate
Lowering checkpoints to deoptimization
Checkpoints completely eliminated during lowering to low-level machine operations (effect-control-linearization).
Every operation that might eager deopt will be assigned the frame state from the last checkpoint.
We check there is no side-effect between the operation and the last checkpoint.
After the lowering, there are no deoptimizing nodes without frame states!
Example: lowering checkpoints
CheckedInt32Add
LoadField
Checkpoint
effect
FrameState
Lower
Int32AddWithOverflow
LoadField
FrameState
value
Projection 0
Projection 1
DeoptimizeIf
effect
control
value
Emitting eager deoptimizations
Instruction selector adds all frame state inputs to the DeoptimizeIf instruction.
FrameState (BC #9)
DeoptimizeIf
StateValues
StateValues
StateValues
Parameter #0
Parameter #1
undefined
1
Instruction (simplified a bit):
DeoptimizeIf(vreg 2, Imm 9, vreg 0, vreg 1, Imm 1, Const undefined)
After register allocation:
DeoptimizeIf(rax, Imm 9, rbx, slot -1, Imm 1, Const undefined)
No code emitted for FrameState and StateValues (the states are folded into the deoptimizing instructions).
Equals
Emitting deoptimizations
Lazy deoptimization states are just added to the corresponding call instruction.
Code generation (eager deopt, slightly simplified)
DeoptimizeIf(rax,
Imm 9,
rbx,
slot -1,
Imm 1,
Const undefined)
call 0xbee9d40400a ;; deopt 1
...
Deoptimization Input Data
index ast id commands
1 9 BEGIN: #frames: 0
INTERPRETED_FRAME: fun f, offset 9
REGISTER: rbx
STACK_SLOT: -1
LITERAL 1
LITERAL <undefined>
Code
gen
Conveniently omitted
Materialization for deoptization after escape analysis.
Representation selection for deoptimization values.
Environment liveness analysis (on bytecode).
Inlining (functions, accessors, constructors, arguments adaptor).
Throwing into lazy-deoptimized code objects.
Future work
General happiness with deoptimization in Turbofan.
We might (again) experiment with differential encoding of frame states.
Computed translations? (E.g., deoptimizing before i++ could recover original value of i by decrementing; JSC seems to be doing that.)
It would be nice not to patch code for lazy deoptimization.
None of this high priority.
Questions