1 of 36

Fast arithmetic for dynamic languages

Jaroslav Sevcik, Benedikt Meurer, Georg Neis

jarin@google.com, bmeurer@google.com, neis@google.com

V8 compiler engineers

Confidential + Proprietary

2 of 36

Plan

Representations of numbers in VMs.
Semantics.
Baseline implementations of arithmetic operations.

Interpreter or non-optimizing compiler.

Type feedback, optimizing compiler.
Deoptimization and learning.
Truncation analysis.
Compiler challenges.
Language design discussion.

New numeric type proposals (Int64, BigNum, SIMD).

Confidential + Proprietary

3 of 36

Encoding of numbers in V8 (x64)

Small integers are encoded into “tagged pointers”.

Integers:

Reference:

E.g., integer 42 encoded as 0x0000002a00000000,� pointer 0x12345678 encoded as 0x12345679

Non-integer numbers live �on the heap.

0000:0000

int32

high 63-bits of pointer

Pointer to heap num

Float64 payload

Heap

Confidential + Proprietary

4 of 36

NaN tagging (JavaScriptCore x64)

Integers, doubles and pointers encoded in 64 bits:

Integers:

� Double

� Reference

In addition, false, true, undefined and null have special encodings (6, 7, 10, 2).

int32

FFFF:0000

48 bit pointer

0000

Payload

0001-FFFE

Double bit-pattern encoded by incrementing by 2^48, so that integer and references overlap with float64 NaN aspace.

Confidential + Proprietary

5 of 36

Addition in spec (semantics)

Confidential + Proprietary

6 of 36

Addition in spec (semantics)

Confidential + Proprietary

7 of 36

Addition in spec (semantics)

Confidential + Proprietary

8 of 36

Addition in spec (semantics)

Confidential + Proprietary

9 of 36

Addition in spec (semantics)

Confidential + Proprietary

10 of 36

Arithmetic in V8’s interpreter

Call to a hand-written assembly code (aka stub) for the fast cases:

Small integers.
Heap numbers.
Oddballs - undefined, null, true, false.
Strings (only the string x string -> string variety).

Other objects handled fall back to a generic C++ implementation.

Confidential + Proprietary

11 of 36

Type feedback

Binary operation stubs also collect type feedback and store it on functions.

function f(a, b){� var c = a + a;� return b + c;�}

Function info

Feedback vector

...

StackCheck

Ldar a0

Add a0, [2]

Star r0

Ldar r0

Add a1, [3]

Return

2: unitialized

3: unitialized

Confidential + Proprietary

12 of 36

Type feedback

Binary operation stubs also collect type feedback and store it on functions.

function f(a, b){� var c = a + a;� return b + c;�}

f(1, 2);

Function info

Feedback vector

...

2: int x int -> int

3: int x int -> int

StackCheck

Ldar a0

Add a0, [2]

Star r0

Ldar r0

Add a1, [3]

Return

Confidential + Proprietary

13 of 36

Type feedback

Binary operation stubs also collect type feedback and store it on functions.

function f(a, b){� var c = a + a;� return b + c;�}

f(1, 2)�f(1, 0.1)

Function info

Feedback vector

...

2: int x int -> int

3: num x int -> num

StackCheck

Ldar a0

Add a0, [2]

Star r0

Ldar r0

Add a1, [3]

Return

Confidential + Proprietary

14 of 36

Optimizing compiler

Uses type feedback to insert dynamic checks.

Generally, checks inserted early in the compiler pipeline.

After the check, compiler uses the type as the static type.

If the check fails, deoptimize.

Deoptimization builds corresponding interpreter frame(s) and jumps to the interpreter.

Confidential + Proprietary

15 of 36

Compile with types

Check inputs, add them up, check for overflow, and return the result.

function f(a, b){� var c = a + a;� return b + c;�}

f(1, 2);�f(1, 2);

StackCheck

Ldar a0

Add a0, [2]

Star r0

Ldar r0

Add a1, [3]

Return

Optimized code

...

movq rax,[rbp+0x18]

test al,0x1

jnz 115

movq rbx,rax

addq rbx,rax

jo 120

movq rax,[rbp+0x10]

test al,0x1

jnz 125

addq rbx,rax

jo 130

movq rax,rbx

movq rsp,rbp

pop rbp

ret 0x18

...

Feedback vector

...

2: int x int -> int

3: int x int -> int

Confidential + Proprietary

16 of 36

Compile with types

Check inputs, add them up, check for overflow, and return the result.

function f(a, b){� var c = a + a;� return b + c;�}�// Optimize�f(1, 2);�f(1, 0.1); // Deopt!

StackCheck

Ldar a0

Add a0, [2]

Star r0

Ldar r0

Add a1, [3]

Return

Optimized code

...

movq rax,[rbp+0x18]

test al,0x1

jnz 115

movq rbx,rax

addq rbx,rax

jo 120

movq rax,[rbp+0x10]

test al,0x1

jnz 125

addq rbx,rax

jo 130

movq rax,rbx

movq rsp,rbp

pop rbp

ret 0x18

...

Deoptimizer

Feedback vector

...

2: int x int -> int

3: int x int -> int

Confidential + Proprietary

17 of 36

Deoptimization

Check inputs, add them up, check for overflow, and return the result.

function f(a, b){� var c = a + a;� return b + c;�}�// Optimize�f(1, 2);�f(1, 0.1); // Deopt!

Feedback vector

...

2: int x int -> int

3: int x int -> int

-> StackCheck

Ldar a0

Add a0, [2]

Star r0

Ldar r0

Add a1, [3]

Return

Deoptimizer

Build unoptimized frame.

Resume after the last side effect.

Here, function start.

Confidential + Proprietary

18 of 36

Deoptimization

Check inputs, add them up, check for overflow, and return the result.

function f(a, b){� var c = a + a;� return b + c;�}�// Optimize�f(1, 2);�f(1, 0.1);

Feedback vector

...

2: int x int -> int

3: num x int -> num

StackCheck

Ldar a0

Add a0, [2]

Star r0

Ldar r0

Add a1, [3]

Return

Deoptimizer

Build unoptimized frame.

Resume after last side effect.

Here, function start.

Type feedback is updated when executing addition again.

Confidential + Proprietary

19 of 36

Re-compile with types

Check inputs, add them up, convert, check for overflow, and return the result.

function f(a, b){� var c = a + a;� return b + c;�}

f(1, 0.1)�f(1, 0.1)

Feedback vector

...

2: int x int -> int

3: num x int -> num

StackCheck

Ldar a0

Add a0, [2]

Star r0

Ldar r0

Add a1, [3]

Return

Optimized code

...

movq rax,[rbp+0x18]�test al,0x1�jnz 179 (0x120096c04373)�shrq rax, 32�movq rbx,rax�addl rbx,rax�jo 332 (0x120096c0440c)�movq rax,[rbp+0x10]�test al,0x1�jz 91 (0x120096c0431b)�movq r10,[r13+0x50]�cmpq [rax-0x1],r10�vmovsd xmm0,[rax+0x7]�jnz 337 (0x120096c04411)�jmp 107 (0x120096c0432b)�movq r10,rax�shrq r10, 32�vxorpd xmm0,xmm0,xmm0�vcvtlsi2sd xmm0,xmm0,r10�vxorpd xmm1,xmm1,xmm1�vcvtlsi2sd xmm1,xmm1,rbx�vaddsd xmm0,xmm1,xmm0�movq rbx,[r13+0x7ff3f0]�movq rax,rbx�addq rax,0x10�cmpq rax,[r13+0x7ff3f8]�ja 254 (0x120096c043be)�movq [r13+0x7ff3f0],rax�incq rbx�movq r10,[r13+0x50]�movq [rbx-0x1],r10�vmovsd [rbx+0x7],xmm0�movq rax,rbx�movq rsp,rbp�pop rbp�ret 0x18�...

Deoptimizer

Confidential + Proprietary

20 of 36

Deoptimization pitfalls

Deoptimization is expensive!

Mostly because re-optimization is expensive.

�We expect to reach steady state.

Not always guaranteed, compiler must be careful.

If no feedback gets updated, we can run into deoptimization loop:

Function keeps getting deoptimized and optimized again until we reach deoptimization limit and the function is never optimized again.

Confidential + Proprietary

21 of 36

Deoptimization loop example

function f(x, y) { // First run with x=1, y = undefined.� if (x) y = x + 1; // Integer feedback for +� // Type feedback suggests y is integer here and used� // as integer later.� // Crankshaft will try to store y in int32… � // … and deoptimize if y is not int32.� if (x) y = y + 2; // Integer feedback for +� return y;�}

Confidential + Proprietary

22 of 36

Deoptimization loop example

function f(x, y) { // First run with x=1, then x=0.� if (x) y = x + 1; // Integer feedback for +� // Deoptimizes here if y not int32!� if (x) y = y + 2; // Integer feedback for +� return y;�}

�f(1, undefined); // Prime the feedback, then optimize.�// Crankshaft keeps deoptimizing, eventually disables opt.�for (var i = 0; i < 1000000; i++) f(0, undefined);

Confidential + Proprietary

23 of 36

Can we do better?

Javascript code (Octane crypto.js)� var l = a[i]&0x3fff, h = a[i++]>>14, m = xh*l+h*xl;� l = ((m&0x3fff)<<14 …

Asm.js code (Octane zlib.js)

e=...|0;K=...|0; ...�S=e+K+c+b+f+E+D+m+o+g+a+C+B+A+z+y+R|0

No overflow checks needed, bitwise ops only observe low 32 bits!

Confidential + Proprietary

24 of 36

Truncations

What is a truncation?

In (x + y)|0, we only care about the lowest 32 bits of the result of the sum. �If x, y are int52, we only care about lowest 32 bits of x and y.

Expression +a[i] does not distinguish between a[i] = undefined and a[i] = NaN.�Reading holes can produce NaN! (Instead of undefined.)

For c ? x : y, we do not distinguish between c = 1 and c = true.

Confidential + Proprietary

25 of 36

Truncation propagation

Propagate truncations from uses to definitions.

At definition, take least upper bound of all uses.

Choose input truncations based on the use truncations.

Possibly also using the upper bound type and type feedback.

Confidential + Proprietary

26 of 36

Example 1

var a = (x|0)+(y|0)*(z&0xff)|0

ToInt32

Confidential + Proprietary

27 of 36

Example 1 (type upper bounds)

var a = (x|0)+(y|0)*(z&0xff)|0

ToInt32

Type: ..-547608329985

Type: 0-255

Type: ..-549755813632

Type: I32

Confidential + Proprietary

28 of 36

Example 1 (truncation propagation)

var a = (x|0)+(y|0)*(z&0xff)|0

ToInt32

Type: ..-547608329985

Type: 0-255

Type: ..-549755813632

Word32

None

Type: I32

Confidential + Proprietary

29 of 36

Example 1 (truncation propagation)

var a = (x|0)+(y|0)*(z&0xff)|0

ToInt32

Type: ..-547608329985

Type: 0-255

Type: ..-549755813632

Word32

None

Type: I32

Confidential + Proprietary

30 of 36

Example 1 (truncation propagation)

var a = (x|0)+(y|0)*(z&0xff)|0

ToInt32

* I32

ToInt32

Type: ..-547608329985

Type: 0-255

Type: ..-549755813632

Word32

Type: I32

No overflow checks necessary, all operations in int32!

Confidential + Proprietary

31 of 36

Example 2

var a = (y|0)*(z&0xff); f(a); a = (x|0)+(a|0)|0

ToInt32

Call

Confidential + Proprietary

32 of 36

Example 2

var a = (y|0)*(z&0xff); f(a); a = (x|0)+(a|0)|0

ToInt32

* F64

ToInt32

Call

Word32

Any

Float64

Confidential + Proprietary

33 of 36

Example 2

var a = (y|0)*(z&0xff); f(a); a = (x|0)+(a|0)|0

ToInt32

* F64

ToInt32

Call

I32 -> F64

F64 -> W32

Confidential + Proprietary

34 of 36

Other uses of truncations

The truncation machinery can be used for other optimizations:

Eliminate minus zero checks for double->integer conversions.
Eliminate minus zero checks for multiplications.
Eliminate undefined checks for array reads.
Choose better representations of values.

Propagation only in V8’s Turbofan compiler, mostly tuned for asm.js-like workloads.

Still some challenges to balance type feedback and truncations.

Confidential + Proprietary

35 of 36

Challenges

Currently, truncation analysis runs first, type feedback does not affect the truncation decision.

Explore combining information from type feedback, truncation analysis and static typing for better results.
For example, consider (x + y|0) with integer feedback for x and y.

Ideally, use type feedback for x and y, then use wrap-around int32 addition.

Smarter choice of “worse” representations if profitable.

For example, a + b + 0.5 should be float64 even if a, b have integer feedback.
Hard to do properly without control flow.

Confidential + Proprietary

36 of 36

Future directions

It is possible that JavaScript could get arbitrary precision integers.

We can use the techniques described here to generate great code!

Is it preferable to have (U)Int64 or BigNum in the language?

Does SIMD make sense for JavaScript (or other dynamic languages)?

Confidential + Proprietary