Deoptimization in V8

Jaroslav Sevcik, 2016/12/09

jarin@google.com

Intro - how V8 optimizes

Generally, V8 tries to optimize code for cases seen in the past.

Past cases gathered in type feedback vector (by the interpreter).

Optimized code has dynamic checks to validate assumptions from feedback.

If a dynamic check fails the deoptimizer deoptimizes the code:

  • Discards the optimized code.
  • Builds the interpreter frame from the optimized stack/registers.
  • Jumps to the interpreter.

Example

function f(o) {
return o.x;
}

Example

function f(o) {
return o.x;
}

f({x : 1}); f({x : 2});

Bytecode

0 : StackCheck

1 : Nop

2 : LdaNamedProperty a0, [0], [2]

6 : Return

First, we run unoptimized code, gathering feedback.

Feedback

...

2: { x : Smi }

Example

function f(o) {
return o.x;
}

f({x : 1}); f({x : 2});

%OptimizeFunctionOnNextCall(f);

f({x : 3});

Bytecode

0 : StackCheck

1 : Nop

2 : LdaNamedProperty a0, [0], [2]

6 : Return

First, we run unoptimized code, gathering feedback.

When hot, we decide to optimize based on feedback. (Simulated here by the runtime call.)

Feedback

...

2: { x : Smi }

Example

function f(o) {
return o.x;
}

f({x : 1}); f({x : 2});

%OptimizeFunctionOnNextCall(f);

f({x : 3});

Optimized code

...

movq rax,[rbp+0x10]

test al,0x1

jz 85

movq rbx,0x2eb96db8c391

cmpq [rax-0x1],rbx

jnz 90

movq rax,[rax+0x17]

movq rsp,rbp

pop rbp

ret 0x10

85: call 0xbee9d404000 ;; deopt 0

90: call 0xbee9d40400a ;; deopt 1

Bytecode

0 : StackCheck

1 : Nop

2 : LdaNamedProperty a0, [0], [2]

6 : Return

First, we run unoptimized code, gathering feedback.

When hot, we decide to optimize based on feedback.

Optimized code explained:

  • If o is Smi, deopt.
  • If o does not have the right map, deopt.
  • Load o.x.
  • Tear down frame, return the result.

Feedback

...

2: { x : Smi }

Example

function f(o) {
return o.x;
}

f({x : 1}); f({x : 2});

%OptimizeFunctionOnNextCall(f);

f({x : 3});

f({y : 1, x : 3});

Optimized code

...

movq rax,[rbp+0x10]

test al,0x1

jz 85

movq rbx,0x2eb96db8c391

cmpq [rax-0x1],rbx

jnz 90

movq rax,[rax+0x17]

movq rsp,rbp

pop rbp

ret 0x10

85: call 0xbee9d404000 ;; deopt 0

90: call 0xbee9d40400a ;; deopt 1

Bytecode

0 : StackCheck

1 : Nop

2 : LdaNamedProperty a0, [0], [2]

6 : Return

What happens if the map check fails?

Feedback

...

2: { x : Smi }

Example

function f(o) {
return o.x;
}

f({x : 1}); f({x : 2});

%OptimizeFunctionOnNextCall(f);

f({x : 3});

f({y : 1, x : 3});

Optimized code

...

movq rax,[rbp+0x10]

test al,0x1

jz 85

movq rbx,0x2eb96db8c391

cmpq [rax-0x1],rbx

jnz 90

movq rax,[rax+0x17]

movq rsp,rbp

pop rbp

ret 0x10

85: call 0xbee9d404000 ;; deopt 0

90: call 0xbee9d40400a ;; deopt 1

Bytecode

0 : StackCheck

1 : Nop

2 : LdaNamedProperty a0, [0], [2]

6 : Return

What happens if the map check fails?

The deoptimizer is called…

This is called eager deoptimization because the code deoptimizes immediately.

Feedback

...

2: { x : Smi }

What does the deoptimizer do?

  • Reads out current stack frame and HW register file into a buffer.
  • Builds interpreter frame(s) from the (optimized) stack frame and registers.
    • Using information emitted by the compiler (aka DeoptimizationInputData).
    • The information contains:
      • Shared function info and bytecode offset to resume at.
      • Locations of parameters in hardware registers/stack slots.
      • Locations of interpreter registers (in hardware registers/stack slots).
      • … for each inlined frame.
  • Materializes objects that have been compiled away by escape analysis.
  • Jumps to the interpreter dispatch.

Example

function f(o) {
return o.x;
}

f({x : 1}); f({x : 2});

%OptimizeFunctionOnNextCall(f);

f({y : 1, x : 3});

d8 --print-opt-code --print-code-verbose --code-comments

90: call 0xbee9d40400a ;; deopt 1

Deoptimization Input Data

index ast id commands

...

1 0 BEGIN: #frames: 0

INTERPRETED_FRAME: fun f, offset 0

STACK_SLOT: 3

STACK_SLOT: -2

REGISTER : rax

STACK_SLOT: 2

LITERAL <optimized_out>

...

Bytecode

0 : StackCheck

1 : Nop

2 : LdaNamedProperty a0, [0], [2]

6 : Return

Printing deoptimization translations associated with optimized code:

d8 --print-opt-code \

--print-code-verbose

V8 will print the deoptimization table for a function when the function compilation finishes.

Example

function f(o) {
return o.x;
}

f({x : 1}); f({x : 2});

%OptimizeFunctionOnNextCall(f);

f({y : 1, x : 3});

d8 --print-opt-code --print-code-verbose --code-comments

90: call 0xbee9d40400a ;; deopt 1

Deoptimization Input Data

index ast id commands

...

1 0 BEGIN: #frames: 0

INTERPRETED_FRAME: fun f, offset 0

STACK_SLOT: 3 Function

STACK_SLOT: -2 Receiver

REGISTER : rax Arg #0

STACK_SLOT: 2 Context

LITERAL <optimized_out> Accumulator

...

Bytecode

0 : StackCheck

1 : Nop

2 : LdaNamedProperty a0, [0], [2]

6 : Return

Printing deoptimization translations associated with optimized code:

d8 --print-opt-code \

--print-code-verbose

V8 will print the deoptimization table for a function when the function compilation finishes.

The table is a bit cryptic.

Example

Deoptimization Input Data (for function f) --print-opt-code

index ast id commands --print-code-verbose

1 0 BEGIN: #frames: 0

INTERPRETED_FRAME: fun f, offset 0

STACK_SLOT: 3 Function

STACK_SLOT: -2 Receiver

REGISTER : rax Arg #0

STACK_SLOT: 2 Context

LITERAL <optimized_out> Accumulator

Printing deoptimizations at runtime:

d8 --trace-deopt

V8 will print the deoptimization trace when deoptimizing.

[deoptimizing (eager): begin <f> @1, ...] --trace-deopt

reading input frame f => bytecode_offset=0, args=2, inputs:

0: 0x319ceb12a529 ; [fp - 16] 0x319ceb12a529 <f>

1: 0x3a4973803239 ; [fp + 24] 0x3a4973803239 <Global obj>

2: 0x3a49738090a1 ; rax 0x3a49738090a1 <an Object ...>

3: 0x319ceb103bf9 ; [fp - 8] 0x... <FixedArray[244]>

4: 0x4069be84c21 ; (literal 1) 0x... <optimized_out>

translating interpreted frame f => bytecode_offset=0, height=8

0x7fff5928da88: [top + 72] <- 0x... ; <Global obj.> (input #1)

0x7fff5928da80: [top + 64] <- 0x... ; <Object ...> (input #2)

-------------------------

0x7fff5928da78: [top + 56] <- 0x18ee3425b89e ; caller's pc

0x7fff5928da70: [top + 48] <- 0x7fff5928daf8 ; caller's fp

0x7fff5928da68: [top + 40] <- 0x... ; context (input #3)

0x7fff5928da60: [top + 32] <- 0x... ; fun <f> (input #0)

0x7fff5928da58: [top + 24] <- 0x... ; new_target (input #0)

0x7fff5928da50: [top + 16] <- 0x... ; bytecode array (input #0)

0x7fff5928da48: [top + 8] <- 0x35...; bc offset 53 (input #0)

-------------------------

0x7fff5928da40: [top + 0] <- 0x...; accumulator <optimized_out> (input #4)

Compilation time

Run/deopt time

Example

Deoptimization Input Data --print-opt-code

index ast id commands --print-code-verbose

1 0 BEGIN: #frames: 0

INTERPRETED_FRAME: fun f, offset 0

STACK_SLOT: 3 Function

STACK_SLOT: -2 Receiver

REGISTER : rax Arg #0

STACK_SLOT: 2 Context

LITERAL <optimized_out> Accumulator

Printing deoptimizations at runtime:

d8 --trace-deopt

V8 will print the deoptimization trace when deoptimizing.

[deoptimizing (eager): begin <f> @1, ...] --trace-deopt

reading input frame f => bytecode_offset=0, args=2, inputs:

0: 0x319ceb12a529 ; [fp - 16] 0x319ceb12a529 <f>

1: 0x3a4973803239 ; [fp + 24] 0x3a4973803239 <Global obj>

2: 0x3a49738090a1 ; rax 0x3a49738090a1 <an Object ...>

3: 0x319ceb103bf9 ; [fp - 8] 0x... <FixedArray[244]>

4: 0x4069be84c21 ; (literal 1) 0x... <optimized_out>

translating interpreted frame f => bytecode_offset=0, height=8

0x7fff5928da88: [top + 72] <- 0x... ; <Global obj.> (input #1)

0x7fff5928da80: [top + 64] <- 0x... ; <Object ...> (input #2)

-------------------------

0x7fff5928da78: [top + 56] <- 0x18ee3425b89e ; caller's pc

0x7fff5928da70: [top + 48] <- 0x7fff5928daf8 ; caller's fp

0x7fff5928da68: [top + 40] <- 0x... ; context (input #3)

0x7fff5928da60: [top + 32] <- 0x... ; fun <f> (input #0)

0x7fff5928da58: [top + 24] <- 0x... ; new_target (input #0)

0x7fff5928da50: [top + 16] <- 0x... ; bytecode array (input #0)

0x7fff5928da48: [top + 8] <- 0x35...; bc offset 53 (input #0)

-------------------------

0x7fff5928da40: [top + 0] <- 0x...; accumulator <optimized_out> (input #4)

Compilation time

Run/deopt time

Lazy deoptimization motivating example


;; Global scope
var o = { x : 1 };
function f() {
return o.x;
}
f();

… Prologue and stack check …
movq rax,0x3415e188941 ;; Get o
movq rax,[rax+0x17] ;; Load o.x
movq rsp,rbp
pop rbp
ret 0x8

No check that o is still global_obj.o, no map check for o!

Instead, we register optimized code for f to be thrown away whenever the cell for o changes.

But what if f is already being executed when changing o?

Lazy deoptimization motivating example

function change_o() {
o = { y : 0, x : 1 };
}

var o = { x : 1 };
function f() {
change_o();
return o.x;
}
f();


48 call [rdi+0x37] ; Call change_o
;; Assumption violated, cannot continue!
51 movq rax,0x3415e188941 ;; Get o
61 movq rax,[rax+0x17] ;; Load o.x
65 movq rsp,rbp
68 pop rbp
69 ret 0x8

  • We need to protect against the change of the global variable o.
  • The compiler registers dependency of optimized code for f on change of o.
  • Whoever changes o is responsible for deoptimizing all the code that depends on it!

Lazy deoptimization motivating example

function change_o() {
o = { y : 0, x : 1 };
}

var o = { x : 1 };
function f() {
change_o();
return o.x;
}
f();


48 call [rdi+0x37] ; Call change_o
;; Lazy deopt point here.
51 movq rax,0x3415e188941 ;; Get o
61 movq rax,[rax+0x17] ;; Load o.x
65 movq rsp,rbp
68 pop rbp
69 ret 0x8

  • In this case, change_o will trigger deoptimization of f.
  • We cannot deoptimize right away because f is not at the top of the stack.
  • We will lazily deoptimize when we return into f.

At the moment we patch all positions after calls with a call to the deoptimizer!

Example

function change_o() {
o = { y : 0, x : 1 };
}

var o = { x : 1 };
function f() {
change_o();
return o.x;
}
f();


48 call [rdi+0x37] ; Call change_o
;; Lazy deopt point here.
51 movq rax,0x3415e188941 ;; Get o
61 movq rax,[rax+0x17] ;; Load o.x
65 movq rsp,rbp
68 pop rbp
69 ret 0x8

index ast id pc commands
0 9 51 BEGIN
INTERPRETED_FRAME <f>, offset 9
STACK_SLOT: 3
STACK_SLOT: -1
STACK_SLOT: 2
LITERAL <optimized_out>
LITERAL <optimized_out>
REGISTER : rax


Example

function change_o() {
o = { y : 0, x : 1 };
}

var o = { x : 1 };
function f() {
change_o();
return o.x;
}
f();


48 call [rdi+0x37] ; Call change_o
;; Lazy deopt point here.
51 call 0x18ee3400400a ;; Patched with a call to
;; the deoptimizer!
;; Potentially garbage here...

index ast id pc commands
0 9 51 BEGIN
INTERPRETED_FRAME <f>, offset 9
STACK_SLOT: 3
STACK_SLOT: -1
STACK_SLOT: 2
LITERAL <optimized_out>
LITERAL <optimized_out>
REGISTER : rax


The deoptimizer will first patch all positions after calls with call to the deoptimizer.

Later, when execution resumes at function f, it immediately deoptimizes.

Trace for lazy deoptimization (--trace-deopt)

[marking dependent code 0x... for deoptimization, reason: property-cell-changed]


Assumption violated

Search for dependent code

Trace for lazy deoptimization (--trace-deopt)

[marking dependent code 0x... for deoptimization, reason: property-cell-changed]
[deoptimize marked code in all contexts]
[deoptimizer unlinked: f / 3b518b4aa901]
[deoptimizer found activation of function: f / 3b518b4aa901]

Assumption violated

Search for dependent code

Patch for lazy deopt

Trace for lazy deoptimization (--trace-deopt)

[marking dependent code 0x... for deoptimization, reason: property-cell-changed]
[deoptimize marked code in all contexts]
[deoptimizer unlinked: f / 3b518b4aa901]
[deoptimizer found activation of function: f / 3b518b4aa901]
[deoptimizing (DEOPT lazy): begin <f>]
reading input frame f => bytecode_offset=9, args=1, height=3; inputs:
0: 0x3b518b4aa901 ; [fp - 16] 0x... <f>
1: 0xf3267683239 ; [fp + 16] 0xf3267683239 <JS Global Object>
2: 0x3b518b483bf9 ; [fp - 8] 0x3b518b483bf9 <FixedArray[252]>
3: 0x473fda84989 ; (literal 1) 0x473fda84989 <Odd Oddball: optimized_out>
4: 0x473fda84989 ; (literal 1) 0x473fda84989 <Odd Oddball: optimized_out>
5: 0x473fda82311 ; rax 0x473fda82311 <undefined>
translating interpreted frame f => bytecode_offset=9, height=24
...
[deoptimizing (lazy): end 0x3b518b4aa901 <f> ... took 0.510 ms]

Assumption violated

Search for dependent code

Patch for lazy deopt

Actual lazy deoptimization after execution resumes at f and hits the patch point.

Deoptimization in the Turbofan compiler

We have separate mechanisms for eager and lazy deoptimization.

Eager deoptimization:

  • Insert a checkpoint node into the effect chain before every potentially deoptimizing node. (Essentially before processing each bytecode.)

Lazy deoptimization:

  • Attach deoptimization state to every potentially calling node.

Lots of deoptimization-related nodes when building graph.

  • We try to reuse parts of previous frame states, represent states as trees.

Example graph

JSLoadNamed

Checkpoint

Framestate (BC# 8)

Framestate (Push) (BC# 9)

StateValues

StateValues

StateValues

effect

Parameter #0

Parameter #1

JSAdd

Parameter #1

JSLoadNamed can deoptimize both lazily and eagerly.

  • Checkpoint refers to the frame state before the load.
  • Lazy frame state refers to the position and state after the load.
  • State values refer to the state of parameters, locals and accumulator.
  • We use the graph builder’s environment tracking to build the deopt states.

Property load:

Semantics of eager deoptimization node

Every potentially deoptimizing node must be in the effect chain.

If a node deoptimizes, it will deoptimize to the state in the last checkpoint in the effect chain.

… or any previous state if there is no observable side-effect between.

Nodes do not have to refer to the (eager) deoptimization state explicitly.

Lazy deoptimization is very much explicit!

Eliminating checkpoints

We eliminate a redundant checkpoint if there is there is a previous checkpoint in the effect chain with no (obervable) side-effect in between.

The previous checkpoint must be reachable without effect merges.

Checkpoint

Checkpoint

LoadField

EffectPhi

Checkpoint

CheckMaps

effect

effect

Framestate

Framestate

Framestate

Eliminating checkpoints

We eliminate a redundant checkpoint if there is there is a previous checkpoint in the effect chain with no (obervable) side-effect in between.

The previous checkpoint must be reachable without effect merges.

Checkpoint

LoadField

CheckMaps

Checkpoint

EffectPhi

effect

effect

Framestate

Framestate

Lowering checkpoints to deoptimization

Checkpoints completely eliminated during lowering to low-level machine operations (effect-control-linearization).

Every operation that might eager deopt will be assigned the frame state from the last checkpoint.

We check there is no side-effect between the operation and the last checkpoint.

After the lowering, there are no deoptimizing nodes without frame states!

Example: lowering checkpoints

CheckedInt32Add

LoadField

Checkpoint

effect

FrameState

Lower

Int32AddWithOverflow

LoadField

FrameState

value

Projection 0

Projection 1

DeoptimizeIf

effect

control

value

Emitting eager deoptimizations

Instruction selector adds all frame state inputs to the DeoptimizeIf instruction.

FrameState (BC #9)

DeoptimizeIf

StateValues

StateValues

StateValues

Parameter #0

Parameter #1

undefined

1

Instruction (simplified a bit):

DeoptimizeIf(vreg 2, Imm 9, vreg 0, vreg 1, Imm 1, Const undefined)

After register allocation:

DeoptimizeIf(rax, Imm 9, rbx, slot -1, Imm 1, Const undefined)


No code emitted for FrameState and StateValues (the states are folded into the deoptimizing instructions).

Equals

Emitting deoptimizations

Lazy deoptimization states are just added to the corresponding call instruction.

Code generation (eager deopt, slightly simplified)

DeoptimizeIf(rax,

Imm 9,

rbx,

slot -1,

Imm 1,

Const undefined)

call 0xbee9d40400a ;; deopt 1

...

Deoptimization Input Data

index ast id commands

1 9 BEGIN: #frames: 0

INTERPRETED_FRAME: fun f, offset 9

REGISTER: rbx

STACK_SLOT: -1

LITERAL 1

LITERAL <undefined>

Code
gen

Conveniently omitted

Materialization for deoptization after escape analysis.

Representation selection for deoptimization values.

Environment liveness analysis (on bytecode).

  • Compression of dead/optimized-out values.

Inlining (functions, accessors, constructors, arguments adaptor).

Throwing into lazy-deoptimized code objects.

Future work

General happiness with deoptimization in Turbofan.

We might (again) experiment with differential encoding of frame states.

  • Similar to Crankshaft, but explicit link to parent state.

Computed translations? (E.g., deoptimizing before i++ could recover original value of i by decrementing; JSC seems to be doing that.)

It would be nice not to patch code for lazy deoptimization.

  • Instead, we could patch return address on stack.
  • Or even always deoptimize eagerly, and store deoptimized stacks on heap!

None of this high priority.

Questions