Crankshafting from the ground up

by marja@google.com (Blink bindings team)

We’ll cover: Crankshafting basics, HIR, optimizing and deoptimizing (type assumptions), reading properties from objects, monomorphic / polymorphic / megamorphic access sites, bindings callbacks, loops.

Prerequisite information

Hidden classes, inline caching (IC) and IC stubs.

Two compilers

V8 has 2 compilers, full-codegen and Crankshaft.

Full-codegen

Fast
Produces slow, unoptimized code
Initially, all code is compiled with full-codegen (lazily)
Doesn’t have an internal representation (IR); directly creates machine language (1-register stack machine) based on the abstract syntax tree (AST)

Crankshaft

Slow
Produces fast, optimized code
Only some functions are crankshafted (i.e., the unoptimized code generated by full-codegen is replaced with the optimized code generated by crankshaft) when V8 notices the functions are hot

it’s possible to crankshaft a function while executing it (e.g., inside a loop), this is called “on stack replacement”

Crankshafted code relies on assumptions on the execution (e.g., which type of an object gets passed to a function). If those are violated, we bail out and replace the crankshafted function with unoptimized code

this also happens on the fly, i.e., when we’re already executing the function!

2 internal representations, HIR and LIR

Full-codegen

Full-codegen traverses the AST and simply generates stack machine code for each node.

FullCodeGenerator is defined in full-codegen.h and the implementations are platform-specific.

For AST visiting, see e.g., FullCodeGenerator::VisitAssignment in full-codegen-ia32.cc.

Full-codegen uses the IC stubs (see FullCodeGenerator::CallIC).

Hydrogen and Lithium

Crankshaft consists of a high-level phase (Hydrogen) and low-level phase (Lithium). They used different internal representations (HIR and LIR, respectively) and do different optimizations.

You can examine the optimization phases with C1Visualizer.

example.js: Demostrating HIR

function f(x) {
var o = new Object();
if (x < 30)
x = x - 5;
else
x = 23;
o.myprop = x;
}

for (var i = 0; i < 1000; ++i) {
f(i);
}

Running with d8 (standalone V8):

v8$ out/ia32.debug/d8 example.js --trace_hydrogen --trace_opt
Parallel recompilation has been disabled for tracing.
[marking 0x5a411a15 <JS Function valueOf (SharedFunctionInfo 0x2c91aca1)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
[marking 0x5a4119e1 <JS Function toString (SharedFunctionInfo 0x2c91c54d)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
[marking 0x5a4126a9 <JS Function IsPrimitive (SharedFunctionInfo 0x2c925915)> for recompilation, reason: small function, ICs with typeinfo: 0/0 (100%)]
-----------------------------------------------------------
Compiling method valueOf using hydrogen
[optimizing 0x5a411a15 <JS Function valueOf (SharedFunctionInfo 0x2c91aca1)> - took 0.094, 8.958, 0.756 ms]
-----------------------------------------------------------
Compiling method IsPrimitive using hydrogen
[optimizing 0x5a4126a9 <JS Function IsPrimitive (SharedFunctionInfo 0x2c925915)> - took 0.054, 22.095, 1.693 ms]
-----------------------------------------------------------
Compiling method toString using hydrogen
[optimizing 0x5a4119e1 <JS Function toString (SharedFunctionInfo 0x2c91c54d)> - took 0.102, 10.625, 0.829 ms]
[marking 0x5a4175e1 <JS Function f (SharedFunctionInfo 0x5a4174c1)> for recompilation, reason: small function, ICs with typeinfo: 4/4 (100%)]
-----------------------------------------------------------
Compiling method f using hydrogen

This creates a file called hydrogen.cfg (CFG = control flow graph) which can be inspected with C1Visualizer. The optimization phases prefixed by “H_” are Hydrogen and the phases prefixed by “L_” are Lithium.

Hydrogen IR

The Hydrogen IR describes the basic blocks, the control flow between the basic blocks. Each basic block consist of instructions written in SSA (static single assignment) form.

The basic blocks for the example:

Next we list the instructions for the basic blocks. v0, v1, t0, t1, s0, s1, etc. are variables used in the SSA representation.

Detail: The meaning of the letters is as follows: s = known to be SMI (small integer), t = tagged representation (can either be a pointer or SMI), v = no representation chosen.

B0:

B0 -> B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v0 BlockEntry
0 5 t1 Context
0 2 t2 Constant 0x2c908091 <undefined>
0 0 t16 Constant 0x2c9080a1 <the hole>
0 2 t3 Parameter 0
0 4 t4 Parameter 1
0 0 t5 ArgumentsObject t3 t4
0 0 v6 Simulate id=2 var[3] = t2, var[2] = t1, var[1] = t4, var[0] = t3
0 0 v7 Goto B1

The BlockEntry and Goto instructions are related to the basic block graph structure.

t3 represents the implicit receiver parameter

v4 represents x

The “Simulate” instructions are for keeping track of what the stack machine state would be, in case we need to bail out and start using unoptimized code. They don’t generate any actual machine instructions.

In the Lithium phase, Crankshaft generates a translation which allow us to map from optimized stack frames to unoptimized ones. The translation is generated based on the HSimulate instructions (see LCodeGen::WriteTranslation in lithium-codegen-ia32.cc). When we detect that we need to bail out from optimized code (in run time), we construct the stack machine state based on the translation, generate the stack machine code, and continue executing it from the right point.

B1 HIR:

_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v8 BlockEntry
0 0 v9 Simulate id=3
0 0 v10 StackCheck t1 changes[NewSpacePromotion]
0 2 t11 Constant 0x5a408211 <JS Function Object (SharedFunctionInfo 0x2c927209)> type:object
0 0 t12 PushArgument t11
0 7 t13 CallNew t1 t11 #1 changes[*]
0 0 v14 Simulate id=10 push t13
0 0 v15 EnvironmentMarker bind var[3]
0 1 s17 Constant 30
0 0 v18 CompareNumericAndBranch LT t4 s17 goto (B4, B2)

The coloured parts correspond to the following JavaScript statements:

var o = new Object(); (and t13 represents o)

if (x < 30)

B2 HIR:

B2 <- B1 -> B3 dom B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v22 BlockEntry
0 0 v23 Simulate id=45 pop 1 / var[3] = t13
0 0 v24 Goto B3

(Nothing interesting in this block.)

B3 HIR:

B3 <- B2 -> B6 dom B2 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v29 BlockEntry
0 2 s30 Constant 23
0 0 v33 Simulate id=43 var[1] = s30
0 0 v34 Goto B6

s30 represents the constant 23

B4 HIR:

B4 <- B1 -> B5 dom B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v19 BlockEntry
0 0 v20 Simulate id=44 pop 1 / var[3] = t13
0 0 v21 Goto B5

(Nothing interesting in this block either.)

B5 HIR:

B5 <- B4 -> B6 dom B4 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v25 BlockEntry
0 1 s26 Constant 5
0 3 v27 Sub t4 s26 ! changes[*]
0 0 v28 Simulate id=30 push v27
0 0 v31 Simulate id=43 pop 1 / var[1] = v27
0 0 v32 Goto B6

v27 represents x - 5

Both x - 5 and 23 are assigned to the variable x, but since this is SSA, we cannot have multiple assignments to the same temporary variable. So we’ll use two different temporary variables, and they’ll be merged later.

B6 state:

B6 <- B5,B3 dom B1 [-1, -1]
Locals size 1 [None]
1 v35 [v27,s30,uses:1_0s_0i_0d_0t]

v35 merges v27 (x - 5) and s30 (23)

This is called Phi operator.

The Phi operator says that v35, v27 and s30 are really the same thing. For example, if we decide to store v35 in a register, we might use the same register for v27 and s30, too, so there is really one physical location for them. Or if it’s not possible to use the same register (e.g., if the basic blocks for computing v27 and s30 need it for something else), the data from v27 or s30 will be moved to it at this point.

B6 HIR:

_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v36 BlockEntry
0 0 v37 EnvironmentMarker lookup var[3]
0 0 t38 CheckHeapObject t13 type:non-primitive
0 0 t39 CheckMaps t13 [0x2d20cf19]
0 1 t40 Constant 0x2d210411 <Map(elements=3)>
0 0 v41 StoreNamedField t13.myprop[backing-store]@8 = v35 (transition map 0x2d210411) changes[Maps,BackingStoreFields]
0 0 v42 Simulate id=55
0 1 s43 Constant 1
0 0 v44 Return t2 (pop s43 values)

o.myprop = x;

The instructions correspond to HInstructions, e.g., the subtraction is represented by HSub. Note that every instruction is also a value (HInstruction subclasses HValue).

The data flow graph describes how the data flows between the temporary variables:

Building the Hydrogen graph

The Hydrogen graph builder builds the Hydrogen graph. It is hydrogen.cc.

A partial view of the HInstruction class structure:

Data loading instructions for monomorphic, polymorphic and megamorphic cases

Full-codegen uses IC stubs for accessing member variables efficiently. When crankshafting is done (remember that this happens after V8 has detected that the function is hot), Crankshaft retrieves type information from IC stubs and builds the code based on type assumptions. If these assumptions fail when running crankshafted code, we bail out to the non-optimized version.

Next we’ll have a look at various Load* instructions for retrieving data from objects. There is a corresponding set of Store* instructions for storing data.

Monomorphic sites and deoptimizing if type assumptions don’t hold

example2.js: Demonstrating bailing out and deoptimizing if a type assumption is violated.

function xgetter(o) {
return o.x;
}

function Point(x, y) {
this.x = x;
this.y = y;
}

function NotPoint() {
this.x = 0;
}

var sum = 0;
for (var i = 0; i < 5000; ++i) {
var p = new Point(i, i+1);
sum += xgetter(p);
}

var np = new NotPoint();
xgetter(np);

Running with d8 (note --trace_deopt):

v8$ out/ia32.debug/d8 example2.js --trace_hydrogen --trace_opt --trace_deopt
Parallel recompilation has been disabled for tracing.
[marking 0x27f11a15 <JS Function valueOf (SharedFunctionInfo 0x4b81aca1)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
[marking 0x27f119e1 <JS Function toString (SharedFunctionInfo 0x4b81c54d)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
[marking 0x27f126a9 <JS Function IsPrimitive (SharedFunctionInfo 0x4b825915)> for recompilation, reason: small function, ICs with typeinfo: 0/0 (100%)]
-----------------------------------------------------------
Compiling method valueOf using hydrogen
[optimizing 0x27f11a15 <JS Function valueOf (SharedFunctionInfo 0x4b81aca1)> - took 0.093, 8.979, 0.757 ms]
-----------------------------------------------------------
Compiling method IsPrimitive using hydrogen
[optimizing 0x27f126a9 <JS Function IsPrimitive (SharedFunctionInfo 0x4b825915)> - took 0.054, 20.539, 1.659 ms]
-----------------------------------------------------------
Compiling method toString using hydrogen
[optimizing 0x27f119e1 <JS Function toString (SharedFunctionInfo 0x4b81c54d)> - took 0.081, 8.991, 0.754 ms]
[marking 0x27f176d9 <JS Function xgetter (SharedFunctionInfo 0x27f174c1)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
-----------------------------------------------------------
Compiling method xgetter using hydrogen
[optimizing 0x27f176d9 <JS Function xgetter (SharedFunctionInfo 0x27f174c1)> - took 0.045, 8.434, 0.776 ms]
[marking 0x27f1770d <JS Function Point (SharedFunctionInfo 0x27f1751d)> for recompilation, reason: small function, ICs with typeinfo: 2/2 (100%)]
-----------------------------------------------------------
Compiling method Point using hydrogen
[optimizing 0x27f1770d <JS Function Point (SharedFunctionInfo 0x27f1751d)> - took 0.115, 10.259, 0.965 ms]
[deoptimizing (DEOPT eager): begin 0x27f176d9 xgetter @2, FP to SP delta: 12]
translating xgetter => node=3, height=0
0xffab5c80: [top + 20] <- 0x27f15189 ; [sp + 24] 0x27f15189 <JS Global Object>
0xffab5c7c: [top + 16] <- 0x3d738dd9 ; eax 0x3d738dd9 <a NotPoint with map 0x46a10551>
0xffab5c78: [top + 12] <- 0x5e6350aa ; caller's pc
0xffab5c74: [top + 8] <- 0xffab5c90 ; caller's fp
0xffab5c70: [top + 4] <- 0x27f08081; context
0xffab5c6c: [top + 0] <- 0x27f176d9; function
[deoptimizing (eager): end 0x27f176d9 xgetter @2 => node=3, pc=0x5e6353e5, state=NO_REGISTERS, alignment=no padding, took 0.118 ms]
[removing optimized code for: xgetter]
[evicting entry from optimizing code map (notify deoptimized) for 0x27f174c1 <SharedFunctionInfo xgetter>]

Inside xgetter:

B1 <- B0 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v8 BlockEntry
0 0 v9 Simulate id=3
0 0 v10 StackCheck t1 changes[NewSpacePromotion]
0 0 t12 CheckHeapObject t4 type:non-primitive
0 1 t13 CheckMaps t4 [0x534104b1]
0 1 s14 LoadNamedField t4.[in-object]@12 t13
0 1 s15 Constant 1
0 0 v16 Return s14 (pop s15 values)

Check that the type assumption for the parameter o is met, and if so, use the efficient LoadNamedField instruction for accessing the property x.

Polymorphic sites

example3.js: Demonstrating HIR for a polymorphic access site:

p = new NotPoint();
sum += ygetter_polymorphic(p);
}

This polymorphic access site generates a more complicated control flow graph:

B1:

B1 <- B0 -> B5,B2 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v8 BlockEntry
0 0 v9 Simulate id=3
0 0 v10 StackCheck t1 changes[NewSpacePromotion]
0 0 t12 CheckHeapObject t4 type:non-primitive
0 1 v13 CompareMap t4 (0x52f10551) goto (B5, B2)

Branching based on the type of the object.

B2:

B2 <- B1 -> B4,B3 dom B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v18 BlockEntry
0 1 v19 CompareMap t4 (0x52f10489) goto (B4, B3)

Branching some more.

B3:

B3 <- B2 -> B6 dom B2 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v26 BlockEntry
0 0 v27 Deoptimize
0 0 v28 Simulate id=8 push s25
0 0 v29 Goto B6

The type was something unexpected, so bail out and deoptimize the function.

B4:

B4 <- B2 -> B6 dom B2 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v20 BlockEntry
0 2 s21 LoadNamedField t4.[in-object]@12 v19
0 0 v22 Simulate id=8 push s21
0 0 v23 Goto B6

B5:

B5 <- B1 -> B6 dom B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v14 BlockEntry
0 2 s15 LoadNamedField t4.[in-object]@16 v13
0 0 v16 Simulate id=8 push s15
0 0 v17 Goto B6

These instructions load the data from the two known types of objects.

Megamorphic sites

If the access site is megamorphic instead of polymorphic (i.e., more than 4 types of objects have been seen), we create a CFG where we check for 4 different possible types, and if the type is none of them, we do a generic load (LoadNamedGeneric).

B1 to B4 check for a specific type of an object like in the previous example. B5 is the generic access:

B5 HIR:

B5 <- B4 -> B10 dom B4 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v37 BlockEntry
0 2 t38 LoadNamedGeneric t4.y changes[*]
0 0 v39 Simulate id=8 push t38
0 0 v40 Goto B10

In addition to the instructions mentioned here, LoadKeyed and LoadKeyedGeneric are used for representing indexed access, e.g., “arr[n]”, where the index cannot be resolved statically

Calling bindings callbacks via IC stubs

DOM objects are managed by Blink, so for accessing data from them, V8 needs to call Blink. When JavaScript calls functions on DOM objects or accesses their properties, C++ callbacks inside the V8/Blink bindings are called.

Let’s switch from d8 to content_shell for a moment:

example5.html

<html><head><script>
function do_stuff() {
for (var i = 0; i < 1000; ++i) domaccess();
}

function domaccess() {
return document.firstChild;
}
</script></head>
<body onload="do_stuff()"></body>
</html>

Running with content_shell (at the root of a Blink checkout):

$ out/Debug/content_shell --dump-render-tree --js-flags="--trace_opt --trace_hydrogen --trace_deopt" --no-sandbox example5.html

B1 HIR:

1 <- B0 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v7 BlockEntry
0 0 v8 Simulate id=3
0 0 v9 StackCheck t1 changes[NewSpacePromotion]
0 1 t10 LoadGlobalCell [0x2fdeaae089c1] (read-only)
0 2 t11 LoadNamedGeneric t10.firstChild changes[*]
0 0 v12 Simulate id=8 push t11
0 1 i13 Constant 0
0 0 v14 Return t11 (pop i13 values)

LoadNamedGeneric uses the IC stub which then calls the corresponding callback in Blink bindings (firstChildAttributeGetterForMainWorld in the generated V8Node.cpp).

Accessing properties which don’t exist (or aren’t known)

LoadNamedGeneric is also used when we have a monomorphic access site, and try to access a property which doesn’t exist for the known type.

example5.js:

function MyObject() {
this.myprop = 0;
}

function do_stuff() {
var o = new MyObject();
for (var i = 0; i < 1000; ++i) getprop(o);
}

function getprop(o) {
return o.foobarbaz;
}

B1 HIR:

B1 <- B0 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v8 BlockEntry
0 0 v9 Simulate id=3
0 0 v10 StackCheck t1 changes[NewSpacePromotion]
0 2 t12 LoadNamedGeneric t4.foobarbaz changes[*]
0 0 v13 Simulate id=8 push t12
0 1 i14 Constant 1
0 0 v15 Return t12 (pop i14 values)

Loops

When the JavaScript contains a loop, the CFG is not a DAG but contains a cycle. Moreover, the decision on whether to continue the loop is affected by variables computed inside the loop.

example6.js: Demonstrating loops

function looper() {
var i = 0;
while (i < 10) {
++i;
}
}

for (var i = 0; i < 5000; ++i) {
looper();
}

The control flow graph contains a cycle:

B0 HIR:

B0 -> B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v0 BlockEntry
0 3 t1 Context
0 2 t2 Constant 0x20708091 <undefined>
0 1 s32 Constant 1
0 0 t19 Constant 0x207080a1 <the hole>
0 3 t3 Parameter 0
0 0 t4 ArgumentsObject t3
0 0 v5 Simulate id=2 var[2] = t2, var[1] = t1, var[0] = t3
0 0 v6 Goto B1

This is the constant 1 used later in the loop body.

B1 HIR:

B1 <- B0 -> B2 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v7 BlockEntry
0 0 v8 Simulate id=3
0 0 v9 StackCheck t1 changes[NewSpacePromotion]
0 2 s10 Constant 0
0 0 v11 EnvironmentMarker bind var[2]
0 0 v15 Simulate id=15 var[2] = s10
0 0 v16 Goto B2

This is the constant 0 i is initialized with.

B2 state:

B2 <- B1,B4 -> B3,B5 dom B1 [-1, -1] dom-loop-succ (loop 0 depth 1)
Locals size 3 [None]
0 v12 [t3,v12,uses:1_0s_0i_0d_0t]
1 v13 [t1,v13,uses:4_0s_0i_0d_0t]
2 v14 [s10,v33,uses:2_0s_0i_0d_0t]

Merging data back from the bottom of the loop. s10 is the initial value (0) and v33 is the value after incrementing i.

B2 HIR:

B2 <- B1,B4 -> B3,B5 dom B1 [-1, -1] dom-loop-succ (loop 0 depth 1)
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v17 BlockEntry
0 0 v18 EnvironmentMarker lookup var[2]
0 1 s20 Constant 10
0 0 v21 CompareNumericAndBranch LT v14 s20 goto (B3, B5)

i < 10 ?

B4 HIR:

B4 <- B3 -> B2 dom B3 [-1, -1] dom-loop-succ (loop 0 depth 1)
_p__bci__use__tid__instruction__________________________________________________ (HIR)
0 0 v28 BlockEntry
0 0 v29 Simulate id=18
0 0 v30 StackCheck v13 changes[NewSpacePromotion]
0 0 v31 EnvironmentMarker lookup var[2]
0 2 v33 Add v14 s32 !
0 0 v34 EnvironmentMarker bind var[2]
0 0 v35 Simulate id=15 var[2] = v33
0 0 v36 Goto B2

++i

The data flow graph shows the circular relationship between v14 and v33.

Deciding whether to crankshaft

The unit for crankshafting is one function (plus whatever other functions are inlined in it).

Some instructions, e.g., try-catch are not crankshaftable. Thus, some nodes in the AST are marked as non-optimizable (see DONT_OPTIMIZE_NODE in ast.cc). If a function contains some of these non-optimizable nodes, the corresponding SharedFunctionInfo::set_dont_optimize() is set to true.

Crankshaft can also bail out while processing the Hydrogen graph, if it decides it cannot optimize (see HOptimizedGraphBuilder::Bailout in hydrogen.cc).

More information

Blog post about Hydrogen