1 of 14

Nova Engine - Building a DOD JS Engine in Rust

Finland Rust-lang Group, January Meetup

January 13th, 2024

Aapo Alasuutari

2 of 14

Nova Engine - Yesterday

A bunch of Deno enthusiasts walk into a Discord channel
“okay, so when are we starting our own JS engine” -Andreu Botella, 24th of August, 2022
Few initial commits for a wasm engine, bike-shedding the name, very first commits on the JS engine code structure
Own parser started
A basic JS Value PR created

Enum with gc crate Gc pointers inside
dyn Object

Own parser removed in favour of oxc parser
Mostly inactive until summer 2023
I hear of DOD paradigm that piques my interest and makes me want to apply it to a JS engine

3 of 14

Nova Engine - Today

Stack based bytecode VM

No AST VM at all

Data-oriented heap design

We’ll talk more about this in a bit

Project structure oriented around ECMAScript specification

Generally, each specification section matches one folder in the project structure
Subsections are either subfolders or modules inside the parent folder
Individual abstract operations in a subsection are exported fns in the corresponding module

Intention is to be easy to understand, implement and update from a specification standpoint

4 of 14

JavaECMAScript

Dynamically typed language: In general, no value has a known type at parse time
Limited runtime type system:

Undefined
Null
String
Boolean
Number
BigInt
Symbol
Object

Functions are Objects, Arrays are Objects, ArrayBuffers are Objects, …
ECMAScript spec mostly talks of “Values”: An engine mostly deals in generic “Value”
How to implement Value?

Copyable Value on stack: All values are as big as the biggest. Requires an internal discriminant.
Reference to Value on heap: Requires an internal discriminant. Can be mitigated with tricks
Enum with reference to Data on heap: Does not require an internal discriminant?

Pointer, or something else?

5 of 14

Value as a heap reference

Naive implementation: Local<’a, Value> is a reference to a heap allocated value which contains a discriminant and all data for the type
V8: Local<’a, Value> is a tagged pointer: It either contains an i32 or a reference to a heap allocated value

Fairly simple, proven performant enough
Handles the most important special case, i32, on the stack
Only heap knows type of data

JSCore: Local<’a, Value> is a NaN-boxed value: It either contains a valid f64, or a NaN with upper bits containing a discriminant and lower bits containing a stack allocated value or a reference to a heap allocated value

Allows for more data than just i32 on the stack, if wanted
NaN tag can discriminate the data: Stack knows type of data, at least to a degree. (3 bits)

Downside: Messing with pointer bits is unsafe and iffy, depending on method may break with 5-level paging, ARM pointer cryptography, …

6 of 14

A Rusty solution: enum Value

Rust enum with payload

What is a reasonable amount of variants? ECMAScript spec defines around 50 distinct types

255 seems to be enough: #[repr(u8)] it is!

What about payload? Reference (pointer) would be obvious but makes Value 16 bytes in size

We know the type based on the enum variant: Can we place Value data of type T into its own arena and “compress” our pointers to a small enough value to fit it in to our enum?
A Vec<T> is a “memory arena” of type T: We do not need a memory offset in the Value, an index into the vector greatly compresses our “pointers”
With a compressed number scale, 32-bit unsigned integer should be plenty enough for the index

Value becomes a (u8, u32): Stack knows the type of an item and knows an index into a heap vector for the that type

7 of 14

Is that it? No!

We have ~200 discriminants left: Make a custom discriminant for i32
What else can we do? (u8, u32) has 3 bytes left over: (u8, [u8; 7])

i56
7-byte small string optimization (“length”, “data”, “value”, …)

Future consideration: We still have at least ~127 discriminants left: That’s 255 / 2

ThrownValue variants! Throwing a Value means generating a Value, setting the top bit and returning early
Requires Try trait Nightly feature to keep ? operator usage

Future consideration: How low can we go? 32 bit? 16 bit?
Future consideration: Value lacks lifetime

Have each Agent (Nova heap) generate a unique lifetime and brand its Values with it, ensuring no cross-pollination happens
Hopefully: Turn Heap into Heap<’gen> and have the borrow checker help with GC through lifetime transforms

8 of 14

So what’s this got to do with DOD?

Data-oriented design: Slippery fish and ill-definable but

Know your data, design around your data first and objects (logical wholes) second
As everything in programming, take in moderation

Data in ECMAScript is the Value variants’ heap data
Prototypal inheritance: No multiple inheritance, inheritance chain can change during runtime

Prototype does not define “internal methods” interface, object type does

Array is an Object, ArrayBuffer is an Object but not an Array, Function is an Object, …

Array behaves like Object but some of its spec-defined “methods” (get, set) are different
Function behaves like Object but has an extra method or two
OOP: Array inherits Object, Function inherits Object, … Object defines a class with virtual methods, Array inherits with some overrides, …
DOD: How is Array actually used? Consider that first, decide how to implement it second

9 of 14

Example: Array usage in ECMAScript

Special property “length”: Returns the current highest index + 1 of all indexed properties in object
Assigning an indexed property may change length, creates holes in the array if length grows by more than 1
Assigning “length” drops items if length shrinks or creates holes if length grows
Normal usage is to push/pop/shift/unshift items; possibly create with Array(N) and assign at index, sometimes assign “length”

push/pop/shift/unshift functions come from prototypal inheritance of the Array.prototype Object

Uncommon usage is to assign non-indexed properties, change prototype

Do we even need the Object-like properties of Arrays normally?

Common usage of Objects: Assign non-indexed properties

Spec does special case indexed properties for Objects but only for property listing order: Do we really need Objects to prepare for indexed properties specially?

10 of 14

Array and Object implementation in Nova Engine

No! We do not need Object-like properties of Arrays normally!

Do not track non-indexed properties of Arrays
Do not track prototype of Arrays

No! We do not need indexed-properties of Objects normally!

Do not track “elements” of Objects

Objects track prototype and properties, be they indexed or non-indexed
Arrays track elements (indexed properties) and an Option<Object>

By default, an Array is only a collection of indexed properties; length is a calculated property, prototype is inferred to be Array.prototype
If a user changes the prototype or assigns non-indexed properties, the backing Object gets created and the changes get tracked there
Array is only 16 bytes in size + N * size_of(Value)

11 of 14

Where are properties saved?

V8 has some inline properties on Objects and Arrays, others are heap allocated with a Vec<Value> on the object

Inline properties often makes sense but costs memory.

Nova of course uses Vec<[Value; N]>! Wait what?

Heap contains vectors of arrays of Values
Array and Object contain a 12 byte ElementsVector that defines the index in Vec<[..]>, which elements vector is being used (cap; N) and how many elements are currently used (len)

What about property descriptors?

ECMAScript defines writable, enumerable, configurable, getter and setter fields for properties
Rarely used, so DOD says: Consider how these are used and decide based on that

“Sparse Vec” ie. HashMap next to Vec<[..]>

Normal objects and arrays never use this, so it never gets created
Class prototypes etc. use this, let them have it but keep it out of the way

12 of 14

What is the point? ECS!

Memory savings? Yes, but no

Cache savings!

All heap allocation of same type creates data in linear memory
Mapping through an array accesses Values in linear memory

This is nothing new, this is how all engines work
But the data of the items in the array are also likely to be in linear memory!

GC mark is the act of iterating through all(*) allocated objects on the heap and following their outward pointers

This is now mostly about traversing linear memory, doing the same access pattern on each item and queueing the found heap indexes for future work
Output indexes to the proper processing queues and do marking in parallel

GC sweep is now simply retain_mut on Heap vectors

Drop unmarked data, mutate marked data to account for drained data (shift indexes down)

What about nursery / new object space?

Post-GC high water mark of each vector is the nursery edge! Anything below is alive, anything above needs marking to stay alive

Entity-Component-System: Value-HeapData-GC

13 of 14

Nova Engine - Tomorrow

Still very, very rough: Only runs a few trivial ECMAScript programs
Lots of (grunt) work to do in implementing spec abstract operations
VM implementation and bytecode generation is barely started
Heap GC algorithm is v0.2
Heap needs to be made concurrent-marking proof

This will be hard, really hard! Single mutator, multiple reader makes it possible
Sweep is already stop-the-world embarrassingly-parallel
I want to do this mostly lock-free, meaning a lot of RCU algorithms

Feature flags

ECMAScript spec optional features
Safe JavaScript variants

Eventually: Be the most interesting ECMAScript engine on the Rust block!

Consider being fast as well

Together With You!