[chromium] Explicit compile hints

marja@chromium.org - April 2023

This document is public and shared with the world!

Introduction

Knowing which JavaScript functions to parse & compile upfront (during the initial parsing & compiling the script) can speed up web page loading.

When processing a script we are loading from the network, we have a choice for each function; either we parse and compile it right then (potentially on a background thread), or we don't. If the function is later called and it was not compiled yet, we need to parse and compile it right then - and that always happens in the main thread.

If the function ends up being called, doing the parsing & compile work upfront is beneficial, because:

During the initial parsing, we anyway need to at least do a lightweight parse to find the function end. In JavaScript, finding the function end requires parsing the full syntax (there are no shortcuts where we could count the curly braces - the grammar is too complex for them to work).
The initial parse might happen on a background thread instead of the main thread. When we need to compile the function because it's being called, it's too late to parallelize work.

Some example numbers from our experiments:

Data set of 4 big web pages, local cold runs: Parse + compile time decreases 4 - 13 %, foreground parse + compile time decreases 59 - 86%.^[a]
Google web devs report 5% improvement in their page load time metrics w/ PIFEing all functions in core file.

Currently, V8 uses the PIFE heuristic to direct which functions to compile. This heuristic has existed for a long time and is well known to web developers. V8 and SpiderMonkey follow the PIFE heuristic, but JavaScriptCore doesn't.

Using PIFEs for transmitting information about which functions should be eager-compiled has downsides, though. Especially:

using it forces using function expressions instead of function declarations; this might regress in all browsers, and for browsers which don't follow the PIFE hint there's no upside
it cannot be applied to ES6 class methods

Thus, we want to experiment with other ways to transmit compile hints^[b]^[c]^[d].

We don't want to cause extra work for browsers who don't want to support compile hints; we'll only explore options where ignoring the hints happens naturally.

We're currently starting early experiments with potential users, and will create the actual spec based on feedback.

Experimenting with compile hints

We plan to experiment with transmitting compile hints as magic comments.

Version 1: ability to mark all functions in a file as eager compiled.

//# experimentalChromiumCompileHints=all

Version 2: additionally, ability to mark individual functions.

//# experimentalChromiumCompileHintsData=TmV2ZXIgZ29ubmEgZ2l2ZSB5b3UgdXAKTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3du^[e]^[f]

Alternatives considered

We've discussed the following options for attaching compile hints:

PIFEs (status quo):

let foo = (function() { ... });

Top level "use eager" directive:

"use eager"; // All functions in this file will be eager

function foo() { ... }

function bar() { ... }

class C {

m() { ... }

}

Similar effect could be achieved by a magic comment without payload:

//# experimentalChromiumCompileHints=all

function foo() { ... }

function bar() { ... }

class C {

m() { ... }

}

Per function "use eager" directive:

function foo() { "use eager"; ... }

class C {

m() { "use eager"; ... }

}

Top level magic comment w/ payload:

//# experimentalChromiumCompileHintsData=TmV2ZXIgZ29ubmEgZ2l2ZSB5b3UgdXAKTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3du^[g]^[h]^[i]^[j]^[k]^[l]^[m]

// The payload encodes the positions of the functions to eager

// compile.

function foo() { ... }

class C {

m() { ... }

}

This metadata can also be transmitted in the script tag^[n]^[o]:

gdXAKTmV2ZXIgZ29ubmEgbGV0IHlvdSBkb3du">

</script>

Per-function magic comment:

/*compile-hint*/ function foo() { ... }

class C {

/*compile-hint*/ m() { ... }

}

All these options are allowed by the EcmaScript spec; other browsers would naturally ignore the unrecognized directives and magic comments.

Transmitting compile hints in a separate file is probably not feasible; we need the compile hints to be available when we start stream-parsing the JavaScript file.

Evaluation of the possible options

As is evident from the table below, all options have obvious downsides. We've been stuck in a local optimum with PIFEs for several years now. If we want to get out of that local optimum, we'll need to accept some other downsides from the other options.

The options are not mutually exclusive; in theory we could experiment with several of them simultaneously.

	PIFEs	Top level "use eager" directive or the corresponding magic comment	Per-function "use eager" directive	Top level magic comment w/ payload or metadata in the script tag	Per-function magic comment
Source code bloat	Some	No	Yes	Some (the more human readable the payload is, the more the source code is bloated)	Yes
Forces using function expressions (see below)	Yes	No	No	No	No
Allows per-function accuracy	Yes	No	Yes	Yes	Yes
Can apply to methods	No	Yes	Yes	Yes	Yes
Easy to implement	Yes (already exists)	Yes	No; awkward that the directive inside a function is seen after the params have been parsed	Medium	Yes
Human readable (although, hand-written JS is not very common anymore)	Yes	Yes	Yes	Probably not	Yes, but if the comment is outside the function, the code is messy
Easy to integrate into tools	Medium; need to change function declarations to function expressions	Yes	Yes	Medium; with metadata in the file, position computations are a bit awkward since the comment changes function positions. Metadata in the script tag doesn't have this problem, but it requires other tools to keep track of the metadatas of various scripts.	Yes
Easy to misuse	Applying any compile hints too eagerly is not a good idea	Yes; eager compiling everything possible is bad for memory	Applying any compile hints too eagerly is not a good idea	Applying any compile hints too eagerly is not a good idea	Applying any compile hints too eagerly is not a good idea
Allows PGO	Yes	Only if functions are organized into files according to PGO	Yes	Yes	Yes

Other considerations

Mitigations against overusing compile hints

^[p]

There's a risk that web developers overuse compile hints in a way that increases memory consumption and regresses web page load time.

We should add usage counters and UMA histograms to monitor how web devs use compile hints.

We might try to detect at run time that a site overuses compile hints, crowdsource the information, and use it for scaling down compilation for such sites.

Binary compile hints format

We'll decide the exact format during experimentation + discussions w/ potential users.

The format should contain a version number: A script might include compile hints for different versions, and browsers can just ignore versions they don't understand.

Version 0 of the format: the format is a list of integers where each integer is varint encoded. The data is then base64 encoded. The list of ints is (version number, position for first function, delta, delta, delta...) where each delta describes the position of the next function, relative to the previous function.

The positions should exclude the length of the magic comment. They can either be relative to the magic comment end, or relative to the beginning of the file but the length of the magic comment subtracted from the actual position. This allows tooling to add the comment without recomputing positions.

[a]Anything a bit more comprehensive? Seems like very initial data

[b]The problem with the PIPE heuristic for large dart2js apps is that it doesn't seem to work. Other ways of hinting might have the same problem.

I see compile pauses on calling the IIFEs functions. I suspect that it is due to GC between the parsing/compiling and the actual call. One example is that the whole program is a ~20MB IIFE that contains lots of variables and statements and few ~1MB IIFEs. These later inner IIFEs suffer from compilation pauses. This adds directly to program startup time.

If compile pauses are really due to GC reclaiming the bytecode, we might need different GC heuristic rather than a different eager compilation heuristic.

Perhaps another eager compilation heuristic would be better: for large nested IIFEs: background-compile inner IIFEs when the containing function starts executing.

[c]This is very interesting, thanks for bringing this up! We'll need to think about what to do in this space - maybe we can keep the explicitly hinted functions alive more aggressively until they're called the first time. (This would also apply to PIFE).

Are there some concrete pages we could have a look at? Or some examples of how the dartjs app code is structured?

[d]Google Ads has a suite of such apps written in Dart and compiled to JavaScript. Campaign Management is one of them.

[e]@marja@google.com Can you clarify what kind of encoding this is? Is this format documented somewhere?

_Assigned to marja@google.com_

[f][as discussed offline] This is not yet specified. It'll contain encoded function positions (maybe relative to the end of the magic comment, so that the comment itself doesn't shift function positions).

[g]I'm guessing this will be generated by tools. Is that correct? If so, how would the tools evaluate what should be eager and what shouldn't?

[h]With all per-function options, we expect the data to be generated by tools. E.g., by PGOing (add instrumentation to the functions to record when they're called, run some realistic payloads, extract the information from the PGO data and mark those functions as eager compiled).

[i]I like the PGO approach, but I think we should expect many tools to use fixed heuristics here (given that they already do so). Have you done outreach to tools authors? If not, maybe the TC39 Tools outreach call could be a good venue (it is not restricted to things within the scope of TC39).

[j](Note, that was from Daniel Ehrenberg; not sure why I'm showing up as anonymous)

[k]Sounds good, I'll get in touch with the TC39 Tools folks.

[l]Any feedback from the TC39 tooling folks?

[m]The sentiment was generally positive, but nobody has so far signed up to experiment with this.

[n]Payload inside the JS file is more likely to stay in sync with the JS itself, I think

[o]Agreed [from littledan]

[p]Another options could be to add a comment that's a hash for the entire file/function including its compile hints, like a "signature". It makes it so that if the intended usage is something like a bundler/server-side tool, this can verify that if new code is added to the file/function, its compile hints are not stale