Crankshafting from the ground up   

by marja@google.com (Blink bindings team)

We’ll cover: Crankshafting basics, HIR, optimizing and deoptimizing (type assumptions), reading properties from objects, monomorphic / polymorphic / megamorphic access sites, bindings callbacks, loops.

Prerequisite information

Hidden classes, inline caching (IC) and IC stubs.

Two compilers

V8 has 2 compilers, full-codegen and Crankshaft.

Full-codegen

Crankshaft

Full-codegen

Full-codegen traverses the AST and simply generates stack machine code for each node.

FullCodeGenerator is defined in full-codegen.h and the implementations are platform-specific.

For AST visiting, see e.g., FullCodeGenerator::VisitAssignment in full-codegen-ia32.cc.

Full-codegen uses the IC stubs (see FullCodeGenerator::CallIC).

Hydrogen and Lithium

Crankshaft consists of a high-level phase (Hydrogen) and low-level phase (Lithium). They used different internal representations (HIR and LIR, respectively) and do different optimizations.

You can examine the optimization phases with C1Visualizer.

example.js: Demostrating HIR

function f(x) {
 var o = new Object();
 if (x < 30)
   x = x - 5;
 else
   x = 23;
 o.myprop = x;
}

for (var i = 0; i < 1000; ++i) {
 f(i);
}

Running with d8 (standalone V8):

v8$ out/ia32.debug/d8 example.js --trace_hydrogen --trace_opt
Parallel recompilation has been disabled for tracing.
[marking 0x5a411a15 <JS Function valueOf (SharedFunctionInfo 0x2c91aca1)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
[marking 0x5a4119e1 <JS Function toString (SharedFunctionInfo 0x2c91c54d)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
[marking 0x5a4126a9 <JS Function IsPrimitive (SharedFunctionInfo 0x2c925915)> for recompilation, reason: small function, ICs with typeinfo: 0/0 (100%)]
-----------------------------------------------------------
Compiling method valueOf using hydrogen
[optimizing 0x5a411a15 <JS Function valueOf (SharedFunctionInfo 0x2c91aca1)> - took 0.094, 8.958, 0.756 ms]
-----------------------------------------------------------
Compiling method IsPrimitive using hydrogen
[optimizing 0x5a4126a9 <JS Function IsPrimitive (SharedFunctionInfo 0x2c925915)> - took 0.054, 22.095, 1.693 ms]
-----------------------------------------------------------
Compiling method toString using hydrogen
[optimizing 0x5a4119e1 <JS Function toString (SharedFunctionInfo 0x2c91c54d)> - took 0.102, 10.625, 0.829 ms]
[marking 0x5a4175e1 <JS Function f (SharedFunctionInfo 0x5a4174c1)> for recompilation, reason: small function, ICs with typeinfo: 4/4 (100%)]
-----------------------------------------------------------
Compiling method f using hydrogen


This creates a file called hydrogen.cfg (CFG = control flow graph) which can be inspected with C1Visualizer. The optimization phases prefixed by “H_” are Hydrogen and the phases prefixed by “L_” are Lithium.

Hydrogen IR

The Hydrogen IR describes the basic blocks, the control flow between the basic blocks. Each basic block consist of instructions written in SSA (static single assignment) form.

The basic blocks for the example:

Next we list the instructions for the basic blocks. v0, v1, t0, t1, s0, s1, etc. are variables used in the SSA representation.

Detail: The meaning of the letters is as follows: s = known to be SMI (small integer), t = tagged representation (can either be a pointer or SMI), v = no representation chosen.

B0:

B0 -> B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    
v0   BlockEntry
   0    5    t1   Context
   0    2    t2   Constant 0x2c908091 <undefined>
   0    0    t16  Constant 0x2c9080a1 <the hole>
   0    2    
t3   Parameter 0                                     
   0    4    
t4   Parameter 1
   0    0    t5   ArgumentsObject t3 t4
   0    0    v6  
Simulate id=2 var[3] = t2, var[2] = t1, var[1] = t4, var[0] = t3
   0    0    
v7   Goto B1

The BlockEntry and Goto instructions are related to the basic block graph structure.

t3 represents the implicit receiver parameter

v4 represents x

The “Simulate” instructions are for keeping track of what the stack machine state would be, in case we need to bail out and start using unoptimized code. They don’t generate any actual machine instructions.

In the Lithium phase, Crankshaft generates a translation which allow us to map from optimized stack frames to unoptimized ones. The translation is generated based on the HSimulate instructions (see LCodeGen::WriteTranslation in lithium-codegen-ia32.cc). When we detect that we need to bail out from optimized code (in run time), we construct the stack machine state based on the translation, generate the stack machine code, and continue executing it from the right point.

B1 HIR:

_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    
v8   BlockEntry
   0    0    v9  
Simulate id=3
   0    0    v10  StackCheck t1 changes[NewSpacePromotion]
   0    2    
t11  Constant 0x5a408211 <JS Function Object (SharedFunctionInfo 0x2c927209)> type:object
   0    0    
t12  PushArgument t11
   0    7    
t13  CallNew t1 t11 #1 changes[*]
   0    0    v14  
Simulate id=10 push t13
   0    0    v15  EnvironmentMarker bind var[3]
   0    1    
s17  Constant 30
   0    0    
v18  CompareNumericAndBranch LT t4 s17 goto (B4, B2)

The coloured parts correspond to the following JavaScript statements:

var o = new Object(); (and t13 represents o)

if (x < 30)

B2 HIR:

B2 <- B1 -> B3 dom B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    
v22  BlockEntry
   0    0  
 v23  Simulate id=45 pop 1 / var[3] = t13
   0    0    
v24  Goto B3

(Nothing interesting in this block.)

B3 HIR:

B3 <- B2 -> B6 dom B2 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    
v29  BlockEntry
   0    2    
s30  Constant 23
   0    0    
v33  Simulate id=43 var[1] = s30
   0    0    
v34  Goto B6

s30 represents the constant 23

B4 HIR:

B4 <- B1 -> B5 dom B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    
v19  BlockEntry
   0    0    
v20  Simulate id=44 pop 1 / var[3] = t13
   0    0    
v21  Goto B5

(Nothing interesting in this block either.)

B5 HIR:

B5 <- B4 -> B6 dom B4 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    
v25  BlockEntry
   0    1    
s26  Constant 5
   0    3    
v27  Sub t4 s26 ! changes[*]
   0    0    
v28  Simulate id=30 push v27
   0    0  
 v31  Simulate id=43 pop 1 / var[1] = v27
   0    0    
v32  Goto B6

v27 represents x - 5

Both x - 5 and 23 are assigned to the variable x, but since this is SSA, we cannot have multiple assignments to the same temporary variable. So we’ll use two different temporary variables, and they’ll be merged later.

B6 state:

B6 <- B5,B3 dom B1 [-1, -1]
 Locals size 1 [None]
   1    
v35  [v27,s30,uses:1_0s_0i_0d_0t]

v35 merges v27 (x - 5) and s30 (23)

This is called Phi operator.

The Phi operator says that v35, v27 and s30 are really the same thing. For example, if we decide to store v35 in a register, we might use the same register for v27 and s30, too, so there is really one physical location for them. Or if it’s not possible to use the same register (e.g., if the basic blocks for computing v27 and s30 need it for something else), the data from v27 or s30 will be moved to it at this point.

B6 HIR:

_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    
v36  BlockEntry
   0    0    v37  EnvironmentMarker lookup var[3]
   0    0    
t38  CheckHeapObject t13 type:non-primitive
   0    0  
 t39  CheckMaps t13 [0x2d20cf19]
   0    1    t40  Constant 0x2d210411 <Map(elements=3)>
   0    0    
v41  StoreNamedField t13.myprop[backing-store]@8 = v35 (transition map 0x2d210411) changes[Maps,BackingStoreFields]
   0    0    
v42  Simulate id=55
   0    1    s43  Constant 1
   0    0    v44  Return t2 (pop s43 values)

o.myprop = x;

The instructions correspond to HInstructions, e.g., the subtraction is represented by HSub. Note that every instruction is also a value (HInstruction subclasses HValue).

The data flow graph describes how the data flows between the temporary variables:

Building the Hydrogen graph

The Hydrogen graph builder builds the Hydrogen graph. It is hydrogen.cc.

A partial view of the HInstruction class structure:

Data loading instructions for monomorphic, polymorphic and megamorphic cases

Full-codegen uses IC stubs for accessing member variables efficiently. When crankshafting is done (remember that this happens after V8 has detected that the function is hot), Crankshaft retrieves type information from IC stubs and builds the code based on type assumptions. If these assumptions fail when running crankshafted code, we bail out to the non-optimized version.

Next we’ll have a look at various Load* instructions for retrieving data from objects. There is a corresponding set of Store* instructions for storing data.

Monomorphic sites and deoptimizing if type assumptions don’t hold

example2.js: Demonstrating bailing out and deoptimizing if a type assumption is violated.

function xgetter(o) {
 return o.x;
}

function Point(x, y) {
 this.x = x;
 this.y = y;
}

function NotPoint() {
 this.x = 0;
}

var sum = 0;
for (var i = 0; i < 5000; ++i) {
 var p = new Point(i, i+1);
 sum += xgetter(p);
}

var np = new NotPoint();
xgetter(np);

Running with d8 (note --trace_deopt):

v8$ out/ia32.debug/d8 example2.js --trace_hydrogen --trace_opt --trace_deopt
Parallel recompilation has been disabled for tracing.
[marking 0x27f11a15 <JS Function valueOf (SharedFunctionInfo 0x4b81aca1)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
[marking 0x27f119e1 <JS Function toString (SharedFunctionInfo 0x4b81c54d)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
[marking 0x27f126a9 <JS Function IsPrimitive (SharedFunctionInfo 0x4b825915)> for recompilation, reason: small function, ICs with typeinfo: 0/0 (100%)]
-----------------------------------------------------------
Compiling method valueOf using hydrogen
[optimizing 0x27f11a15 <JS Function valueOf (SharedFunctionInfo 0x4b81aca1)> - took 0.093, 8.979, 0.757 ms]
-----------------------------------------------------------
Compiling method IsPrimitive using hydrogen
[optimizing 0x27f126a9 <JS Function IsPrimitive (SharedFunctionInfo 0x4b825915)> - took 0.054, 20.539, 1.659 ms]
-----------------------------------------------------------
Compiling method toString using hydrogen
[optimizing 0x27f119e1 <JS Function toString (SharedFunctionInfo 0x4b81c54d)> - took 0.081, 8.991, 0.754 ms]
[marking 0x27f176d9 <JS Function xgetter (SharedFunctionInfo 0x27f174c1)> for recompilation, reason: small function, ICs with typeinfo: 1/1 (100%)]
-----------------------------------------------------------
Compiling method xgetter using hydrogen
[optimizing 0x27f176d9 <JS Function xgetter (SharedFunctionInfo 0x27f174c1)> - took 0.045, 8.434, 0.776 ms]
[marking 0x27f1770d <JS Function Point (SharedFunctionInfo 0x27f1751d)> for recompilation, reason: small function, ICs with typeinfo: 2/2 (100%)]
-----------------------------------------------------------
Compiling method Point using hydrogen
[optimizing 0x27f1770d <JS Function Point (SharedFunctionInfo 0x27f1751d)> - took 0.115, 10.259, 0.965 ms]
[deoptimizing (DEOPT eager): begin 0x27f176d9 xgetter @2, FP to SP delta: 12]
 translating xgetter => node=3, height=0
   0xffab5c80: [top + 20] <- 0x27f15189 ; [sp + 24] 0x27f15189 <JS Global Object>
   0xffab5c7c: [top + 16] <- 0x3d738dd9 ; eax 0x3d738dd9 <a NotPoint with map 0x46a10551>
   0xffab5c78: [top + 12] <- 0x5e6350aa ; caller's pc
   0xffab5c74: [top + 8] <- 0xffab5c90 ; caller's fp
   0xffab5c70: [top + 4] <- 0x27f08081; context
   0xffab5c6c: [top + 0] <- 0x27f176d9; function
[deoptimizing (eager): end 0x27f176d9 xgetter @2 => node=3, pc=0x5e6353e5, state=NO_REGISTERS, alignment=no padding, took 0.118 ms]
[removing optimized code for: xgetter]
[evicting entry from optimizing code map (notify deoptimized) for 0x27f174c1 <SharedFunctionInfo xgetter>]

Inside xgetter:

B1 <- B0 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v8   BlockEntry
   0    0    v9   Simulate id=3
   0    0    v10  StackCheck t1 changes[NewSpacePromotion]
   0    0    
t12  CheckHeapObject t4 type:non-primitive
   0    1    
t13  CheckMaps t4 [0x534104b1]
   0    1    
s14  LoadNamedField t4.[in-object]@12 t13
   0    1    s15  Constant 1
   0    0    v16  Return s14 (pop s15 values)

Check that the type assumption for the parameter o is met, and if so, use the efficient LoadNamedField instruction for accessing the property x.

Polymorphic sites

example3.js: Demonstrating HIR for a polymorphic access site:

function ygetter_polymorphic(o) {
 return o.y;
}

function Point(x, y) {
 this.x = x;
 this.y = y;
}

function NotPoint() {
 this.y = 0;
}

var sum = 0;
for (var i = 0; i < 5000; ++i) {
 var p = new Point(i, i+1);
 sum += ygetter_polymorphic(p);

  p = new NotPoint();
 sum += ygetter_polymorphic(p);
}

This polymorphic access site generates a more complicated control flow graph:

B1:

B1 <- B0 -> B5,B2 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v8   BlockEntry
   0    0    v9   Simulate id=3
   0    0    v10  StackCheck t1 changes[NewSpacePromotion]
   0    0    t12  CheckHeapObject t4 type:non-primitive
   0    1    
v13  CompareMap t4 (0x52f10551) goto (B5, B2)

Branching based on the type of the object.

B2:

B2 <- B1 -> B4,B3 dom B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v18  BlockEntry
   0    1    
v19  CompareMap t4 (0x52f10489) goto (B4, B3)

Branching some more.

B3:

B3 <- B2 -> B6 dom B2 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v26  BlockEntry
   0    0    
v27  Deoptimize
   0    0    v28  Simulate id=8 push s25
   0    0    v29  Goto B6

The type was something unexpected, so bail out and deoptimize the function.

B4:

B4 <- B2 -> B6 dom B2 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v20  BlockEntry
   0    2    
s21  LoadNamedField t4.[in-object]@12 v19
   0    0    v22  Simulate id=8 push s21
   0    0    v23  Goto B6

B5:

B5 <- B1 -> B6 dom B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v14  BlockEntry
   0    2    
s15  LoadNamedField t4.[in-object]@16 v13
   0    0    v16  Simulate id=8 push s15
   0    0    v17  Goto B6

These instructions load the data from the two known types of objects.

Megamorphic sites

If the access site is megamorphic instead of polymorphic (i.e., more than 4 types of objects have been seen), we create a CFG where we check for 4 different possible types, and if the type is none of them, we do a generic load (LoadNamedGeneric).

function ygetter_megamorphic(o) {
 return o.y;
}

function Point(x, y) {
 this.x = x; this.y = y;
}

function NotPoint() {
 this.y = 0;
}

function NotAPointEither() {
 this.a = 0; this.b = 0; this.y = 3;
}

function DefinitelyNotAPoint() {
 this.a = 0; this.b = 0; this.c = 0; this.y = 4;
}

function SomethingCompletelyDifferent() {
 this.a = 0; this.b = 0; this.c = 0; this.d = 0; this.y = 5;
}

function DontAssumeIAmAPoint() {
 this.a = 0; this.b = 0; this.c = 0; this.d = 0; this.e = 0; this.y = 5;
}

var sum = 0;
for (var i = 0; i < 5000; ++i) {
 var p = new Point(i, i+1);
 sum += ygetter_megamorphic(p);
 p = new NotPoint();
 sum += ygetter_megamorphic(p);
 p = new NotAPointEither();
 sum += ygetter_megamorphic(p);
 p = new DefinitelyNotAPoint();
 sum += ygetter_megamorphic(p);
 p = new SomethingCompletelyDifferent();
 sum += ygetter_megamorphic(p);
 p = new DontAssumeIAmAPoint();
 sum += ygetter_megamorphic(p);
}

B1 to B4 check for a specific type of an object like in the previous example. B5 is the generic access:

B5 HIR:

B5 <- B4 -> B10 dom B4 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v37  BlockEntry
   0    2    
t38  LoadNamedGeneric t4.y changes[*]
   0    0    v39  Simulate id=8 push t38
   0    0    v40  Goto B10

In addition to the instructions mentioned here, LoadKeyed and LoadKeyedGeneric are used for representing indexed access, e.g., “arr[n]”, where the index cannot be resolved statically

Calling bindings callbacks via IC stubs

DOM objects are managed by Blink, so for accessing data from them, V8 needs to call Blink. When JavaScript calls functions on DOM objects or accesses their properties, C++ callbacks inside the V8/Blink bindings are called.

Let’s switch from d8 to content_shell for a moment:

example5.html

<html><head><script>
function do_stuff() {
 for (var i = 0; i < 1000; ++i) domaccess();
}

function domaccess() {
 return document.firstChild;
}
</script></head>
<body onload="do_stuff()"></body>
</html>

Running with content_shell (at the root of a Blink checkout):

$ out/Debug/content_shell --dump-render-tree --js-flags="--trace_opt --trace_hydrogen --trace_deopt" --no-sandbox  example5.html

B1 HIR:

1 <- B0 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v7   BlockEntry
   0    0    v8   Simulate id=3
   0    0    v9   StackCheck t1 changes[NewSpacePromotion]
   0    1    
t10  LoadGlobalCell [0x2fdeaae089c1] (read-only)
   0    2    
t11  LoadNamedGeneric t10.firstChild changes[*]
   0    0    v12  Simulate id=8 push t11
   0    1    i13  Constant 0
   0    0    v14  Return t11 (pop i13 values)

LoadNamedGeneric uses the IC stub which then calls the corresponding callback in Blink bindings (firstChildAttributeGetterForMainWorld in the generated V8Node.cpp).

Accessing properties which don’t exist (or aren’t known)

LoadNamedGeneric is also used when we have a monomorphic access site, and try to access a property which doesn’t exist for the known type.

example5.js:

function MyObject() {
 this.myprop = 0;
}

function do_stuff() {
 var o = new MyObject();
 for (var i = 0; i < 1000; ++i) getprop(o);
}

function getprop(o) {
 return o.foobarbaz;
}

B1 HIR:

B1 <- B0 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v8   BlockEntry
   0    0    v9   Simulate id=3
   0    0    v10  StackCheck t1 changes[NewSpacePromotion]
   0    2    
t12  LoadNamedGeneric t4.foobarbaz changes[*]
   0    0    v13  Simulate id=8 push t12
   0    1    i14  Constant 1
   0    0    v15  Return t12 (pop i14 values)

Loops

When the JavaScript contains a loop, the CFG is not a DAG but contains a cycle. Moreover, the decision on whether to continue the loop is affected by variables computed inside the loop.

example6.js: Demonstrating loops

function looper() {
 var i = 0;
 while (i < 10) {
   ++i;
 }
}

for (var i = 0; i < 5000; ++i) {
 looper();
}

The control flow graph contains a cycle:

B0 HIR:

B0 -> B1 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v0   BlockEntry
   0    3    t1   Context
   0    2    t2   Constant 0x20708091 <undefined>
   0    1    
s32  Constant 1
   0    0    t19  Constant 0x207080a1 <the hole>
   0    3    t3   Parameter 0
   0    0    t4   ArgumentsObject t3
   0    0    v5   Simulate id=2 var[2] = t2, var[1] = t1, var[0] = t3
   0    0    v6   Goto B1

This is the constant 1 used later in the loop body.

B1 HIR:

B1 <- B0 -> B2 dom B0 [-1, -1]
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v7   BlockEntry
   0    0    v8   Simulate id=3
   0    0    v9   StackCheck t1 changes[NewSpacePromotion]
   0    2    
s10  Constant 0
   0    0    v11  EnvironmentMarker bind var[2]
   0    0    v15  Simulate id=15 var[2] = s10
   0    0    v16  Goto B2

This is the constant 0 i is initialized with.

B2 state:

B2 <- B1,B4 -> B3,B5 dom B1 [-1, -1] dom-loop-succ (loop 0 depth 1)
 Locals size 3 [None]
   0    v12  [t3,v12,uses:1_0s_0i_0d_0t]
   1    v13  [t1,v13,uses:4_0s_0i_0d_0t]
   2    
v14  [s10,v33,uses:2_0s_0i_0d_0t]

Merging data back from the bottom of the loop. s10 is the initial value (0) and v33 is the value after incrementing i.

B2 HIR:

B2 <- B1,B4 -> B3,B5 dom B1 [-1, -1] dom-loop-succ (loop 0 depth 1)
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v17  BlockEntry
   0    0    v18  EnvironmentMarker lookup var[2]
   0    1    
s20  Constant 10
   0    0    
v21  CompareNumericAndBranch LT v14 s20 goto (B3, B5)

i < 10 ?

B4 HIR:

B4 <- B3 -> B2 dom B3 [-1, -1] dom-loop-succ (loop 0 depth 1)
_p__bci__use__tid__instruction__________________________________________________ (HIR)
   0    0    v28  BlockEntry
   0    0    v29  Simulate id=18
   0    0    v30  StackCheck v13 changes[NewSpacePromotion]
   0    0    v31  EnvironmentMarker lookup var[2]
   0    2    
v33  Add v14 s32 !
   0    0    v34  EnvironmentMarker bind var[2]
   0    0    v35  Simulate id=15 var[2] = v33
   0    0    v36  Goto B2

++i

The data flow graph shows the circular relationship between v14 and v33.

Deciding whether to crankshaft

The unit for crankshafting is one function (plus whatever other functions are inlined in it).

Some instructions, e.g., try-catch are not crankshaftable. Thus, some nodes in the AST are marked as non-optimizable (see DONT_OPTIMIZE_NODE in ast.cc). If a function contains some of these non-optimizable nodes, the corresponding SharedFunctionInfo::set_dont_optimize() is set to true.

Crankshaft can also bail out while processing the Hydrogen graph, if it decides it cannot optimize (see HOptimizedGraphBuilder::Bailout in hydrogen.cc).

More information

Blog post about Hydrogen