1 of 60

TurboFan: A new code generation architecture for V8

Benedikt Meurer

Google Munich

@bmeurer

Proprietary

Proprietary

2 of 60

Ignition and TurboFan

enabled in

Chrome Dev / Canary!

Proprietary

Proprietary

3 of 60

Proprietary

4 of 60

Why a new code generation architecture?

  • Improve baseline performance
  • Make performance predictable
  • Reduce page load time
  • Reduce memory usage
  • Reduce complexity

Proprietary

5 of 60

A bit of history...

Proprietary

Proprietary

6 of 60

Compiler pipeline (2008)

Codegen

Semi-optimized Code

Parser

Abstract Syntax Tree

JavaScript Source Code

Proprietary

7 of 60

Compiler pipeline (2010)

Full-

Codegen

Unoptimized Code

Crankshaft

Optimized Code

Parser

Abstract Syntax Tree

JavaScript Source Code

Optimized

Baseline

Optimize

Deoptimize

Proprietary

8 of 60

Compiler pipeline (2014)

Full-

Codegen

Unoptimized Code

Crankshaft

Optimized Code

TurboFan

Parser

Abstract Syntax Tree

JavaScript Source Code

Optimized

Baseline

Optimize

Deoptimize

Proprietary

9 of 60

Compiler pipeline (2016)

Full-

Codegen

Unoptimized Code

Crankshaft

Optimized Code

TurboFan

Parser

Abstract Syntax Tree

Ignition

Bytecode

JavaScript Source Code

Optimized

Baseline

Interpreted

Optimize

Deoptimize

Baseline

(Low-end Android)

Proprietary

10 of 60

WTF?!!1

Proprietary

Proprietary

11 of 60

Compiler pipeline (2017)

Optimized Code

TurboFan

Parser

Abstract Syntax Tree

Ignition

Bytecode

JavaScript Source Code

Optimized

Interpreted

Deoptimize

Optimize

Proprietary

12 of 60

Reduce complexity

Proprietary

13 of 60

Abstract Syntax Tree

Abstract Syntax Tree

Full-

Codegen

Crankshaft

TurboFan

Ignition

Proprietary

14 of 60

Abstract Syntax Tree

Ignition

TurboFan

Abstract Syntax Tree

const f = (a, b) => a + b * 3;

Bytecode

0 : 87 StackCheck

1 : 03 03 LdaSmi [3]

3 : 2e 02 02 Mul a1, [2]

6 : 2c 03 03 Add a0, [3]

9 : 8b Return

Proprietary

15 of 60

Abstract Syntax Tree

Proprietary

16 of 60

Supported architectures

Proprietary

17 of 60

Supported architectures

arm

ia32

x64

mips

V8 3.24.9

arm

ia32

x64

mips

x87

arm64

mips64

ppc

s390

V8 5.9.66

Proprietary

18 of 60

Architecture-specific LOCs

Proprietary

19 of 60

29%

less architecture-specific code from 2013 to 2017

Proprietary

20 of 60

Unified code generation architecture

WASM

Compiler

TurboFan

Optimizing

Compiler

Builtins

Data-driven inline caches

Code stubs

TurboFan code

generation

architecture

Instruction Selector

Code Generator

Register Allocator

Intel, ARM, PowerPC,

MIPS

Control flow graph

Scheduler

“Sea of nodes” graph

RawMachineAssembler

CodeAssembler

C++ DSL

CodeStubAssembler

Ignition Interpreter

Bytecodes

Proprietary

21 of 60

Unified code generation architecture

  • (not just an) Optimizing compiler
  • Interpreter bytecode handlers
  • Builtins (Object.create, Array.prototype.indexOf, etc.)
  • Code stubs / IC subsystem
  • WebAssembly code generation

Proprietary

22 of 60

Unified code generation architecture

  • More opportunities for performance improvements
  • Fewer bugs (no more register allocation by hand)
  • Easier to port to new architectures

Proprietary

23 of 60

Predictable Performance

Proprietary

24 of 60

Optimization Killers

Proprietary

25 of 60

Optimization Killers

  • Generators and async functions
  • for-of and destructuring
  • try-catch and try-finally
  • Compound let or const assignment
  • Object literals that contain __proto__, or get or set declarations.
  • debugger or with statements
  • Literal calls to eval()

Proprietary

26 of 60

Optimization Killers - arguments object

const arr = new Array(4000);function mymax() { return Math.max.apply(undefined, arguments); }for (let i = 0; i < 2000; ++i) mymax(...arr);

$ node --trace-opt --trace-deopt apply.js�…�[deoptimizing (DEOPT eager): begin 0x34e770317ea1 <JS Function mymax (SharedFunctionInfo 0x2793cb1f4031)> (opt #5) @2, FP to SP delta: 24, caller sp: 0x7fff5fbf6378]�…�[disabled optimization for 0x2793cb1f4031 <SharedFunctionInfo mymax>, reason: Optimized too many times]�$

Too many arguments!

Proprietary

27 of 60

Optimization Killers - arguments object

var callbacks = [function sloppy() {},function strict() { "use strict"; }];��function dispatch() {for (var l = callbacks.length, i = 0; i < l; ++i) {� callbacks[i].apply(null, arguments);}}��for(var i = 0; i < 100000; ++i) dispatch(1, 2, 3, 4);

[disabled optimization for … <SharedFunctionInfo dispatch>,� reason: Bad value context for arguments value]

Proprietary

28 of 60

Optimization Killers

Proprietary

29 of 60

Baseline Performance

Proprietary

30 of 60

Baseline Performance - Builtins

Proprietary

31 of 60

Baseline Performance - Builtins

  • Expectation mismatch: “builtins should be fast”
  • TurboFan’s CodeStubAssembler basis for fast builtins
  • Ideally no performance cliff

Proprietary

32 of 60

Baseline Performance - RegExp Builtins

Proprietary

33 of 60

Baseline Performance - Promises & async/await

Proprietary

34 of 60

Baseline Performance

  • Baseline performance matters
  • Optimize too early is costly
    • Might deoptimize quickly because type feedback not stable yet
    • Optimization is expensive
  • Generated code quality depends on quality and stability of type feedback

Proprietary

35 of 60

Improve Startup Time

Proprietary

36 of 60

Proprietary

37 of 60

Improve Startup Time

  • Bytecode faster to generate
  • Better suited for smaller icache - (low-end) mobile
  • Parse only once, optimize from bytecode
  • Optimize less aggressively - better baseline performance
  • Data-driven ICs reduce slow path cost

Proprietary

38 of 60

Reduce Memory Usage

Proprietary

39 of 60

Reduce Memory Usage

  • Memory is precious on mobile
  • Android devices with 512MiB - 1GiB are common
  • Emerging markets
  • Ignition code is up to 8x smaller than Full-Codegen code (arm64)

Proprietary

40 of 60

Reduce Memory Usage

Proprietary

41 of 60

Proprietary

Proprietary

42 of 60

Chrome Dev / Canary

Proprietary

43 of 60

Node vee-eight-lkgr (Ubuntu/x86_64 build)

Proprietary

44 of 60

Node vee-eight-lkgr (source build)

git clone https://github.com/v8/node.git node-v8�cd node-v8�git checkout vee-eight-lkgr�./configure --prefix "$HOME/Applications/node-vee-eight-lkgr"�make install

Proprietary

45 of 60

Performance Results

Proprietary

Proprietary

46 of 60

Page Load Time (Top25 Websites)

Proprietary

47 of 60

Page Load Time (LinkedIn Feed)

Proprietary

48 of 60

Page Load Time (Script Execution)

Time HTML parser blocked on script execution

Proprietary

49 of 60

Framework Performance (Speedometer)

Proprietary

50 of 60

Framework Performance (Speedometer)

Proprietary

51 of 60

Framework Performance (Ember)

Proprietary

52 of 60

Node Server Workloads (AcmeAir)

Throughput on AcmeAir

Proprietary

53 of 60

UglifyJS2

UglifyJS is a JavaScript parser, minifier, compressor or beautifier toolkit.

github.com/mishoo/UglifyJS2

Proprietary

Proprietary

54 of 60

Cryptographic / Number Crunching Applications

Deterministic Prime Number Generation

Proprietary

55 of 60

Performance Advice

Proprietary

Proprietary

56 of 60

Performance Advice

“Premature optimization is the root of all evil”

Proprietary

57 of 60

Performance Advice

  • Write idiomatic, declarative JavaScript
    • Appropriate language features
    • Handle exceptions where necessary
    • Use proper collections (Map, Set, WeakMap, WeakSet, etc.)
  • Avoid (engine-specific) work-arounds
  • File bug reports when something is slow

Proprietary

58 of 60

Performance Advice - Declarative JavaScript

Declarative

if (obj !== undefined) { return obj.x; }

Obscure

if (obj) { return obj.x; }

... ...........� 27 cmpq [r13-0x40],rax

31 jz 128

37 test al,0x1

39 setzl bl

42 movzxbl rbx,rbx

45 cmpl rbx,0x0

48 jnz 185

54 cmpq [r13-0x38],rax

58 jz 128

64 movq rdx,[rax-0x1]

68 testb [rdx+0xc],0x10

72 jnz 128

78 cmpq [r13+0x50],rdx

82 jz 160

... ...........

160 vmovsd xmm0,[rax+0x7]

165 movq [rbp-0x18],rbx

169 vxorpd xmm1,xmm1,xmm1

173 vucomisd xmm1,xmm0

177 jz 128

179 movq rbx,[rbp-0x18]

183 jmp 88

185 movq [rbp-0x18],rbx

189 cmpq rax,0x0

193 jz 128

195 movq rbx,[rbp-0x18]

199 jmp 88

... ...........

... ...........23 cmpq [r13-0x60],rax

27 jz 72

... ...........

~15% faster on avg!

Proprietary

59 of 60

Performance Advice - Declarative JavaScript

Declarative

function foo6(f, ...args) {return f(...args);}

Obscure

function foo5(f) {switch (arguments.length) {case 1: return f();case 2: return f(arguments[1]);case 3: return f(arguments[1], arguments[2]);default: {var args = [];for (var i = 1; i < arguments.length; ++i) {� args[i - 1] = arguments[i];}return f.apply(undefined, args);}}}

~4.5x faster on avg!

Proprietary

60 of 60

Proprietary