1 of 14

Nova Engine - Building a DOD JS Engine in Rust

Finland Rust-lang Group, January Meetup

January 13th, 2024

Aapo Alasuutari

2 of 14

Nova Engine - Yesterday

  • A bunch of Deno enthusiasts walk into a Discord channel
  • “okay, so when are we starting our own JS engine” -Andreu Botella, 24th of August, 2022
  • Few initial commits for a wasm engine, bike-shedding the name, very first commits on the JS engine code structure
  • Own parser started
  • A basic JS Value PR created
    • Enum with gc crate Gc pointers inside
    • dyn Object
  • Own parser removed in favour of oxc parser
  • Mostly inactive until summer 2023
  • I hear of DOD paradigm that piques my interest and makes me want to apply it to a JS engine

3 of 14

Nova Engine - Today

  • Stack based bytecode VM
    • No AST VM at all
  • Data-oriented heap design
    • We’ll talk more about this in a bit
  • Project structure oriented around ECMAScript specification
    • Generally, each specification section matches one folder in the project structure
    • Subsections are either subfolders or modules inside the parent folder
    • Individual abstract operations in a subsection are exported fns in the corresponding module
  • Intention is to be easy to understand, implement and update from a specification standpoint

4 of 14

JavaECMAScript

  • Dynamically typed language: In general, no value has a known type at parse time
  • Limited runtime type system:
    • Undefined
    • Null
    • String
    • Boolean
    • Number
    • BigInt
    • Symbol
    • Object
  • Functions are Objects, Arrays are Objects, ArrayBuffers are Objects, …
  • ECMAScript spec mostly talks of “Values”: An engine mostly deals in generic “Value”
  • How to implement Value?
    • Copyable Value on stack: All values are as big as the biggest. Requires an internal discriminant.
    • Reference to Value on heap: Requires an internal discriminant. Can be mitigated with tricks
    • Enum with reference to Data on heap: Does not require an internal discriminant?
      • Pointer, or something else?

5 of 14

Value as a heap reference

  • Naive implementation: Local<’a, Value> is a reference to a heap allocated value which contains a discriminant and all data for the type
  • V8: Local<’a, Value> is a tagged pointer: It either contains an i32 or a reference to a heap allocated value
    • Fairly simple, proven performant enough
    • Handles the most important special case, i32, on the stack
    • Only heap knows type of data
  • JSCore: Local<’a, Value> is a NaN-boxed value: It either contains a valid f64, or a NaN with upper bits containing a discriminant and lower bits containing a stack allocated value or a reference to a heap allocated value
    • Allows for more data than just i32 on the stack, if wanted
    • NaN tag can discriminate the data: Stack knows type of data, at least to a degree. (3 bits)
  • Downside: Messing with pointer bits is unsafe and iffy, depending on method may break with 5-level paging, ARM pointer cryptography, …

6 of 14

A Rusty solution: enum Value

  • Rust enum with payload
    • What is a reasonable amount of variants? ECMAScript spec defines around 50 distinct types
      • 255 seems to be enough: #[repr(u8)] it is!
  • What about payload? Reference (pointer) would be obvious but makes Value 16 bytes in size
    • We know the type based on the enum variant: Can we place Value data of type T into its own arena and “compress” our pointers to a small enough value to fit it in to our enum?
    • A Vec<T> is a “memory arena” of type T: We do not need a memory offset in the Value, an index into the vector greatly compresses our “pointers”
    • With a compressed number scale, 32-bit unsigned integer should be plenty enough for the index
  • Value becomes a (u8, u32): Stack knows the type of an item and knows an index into a heap vector for the that type

7 of 14

Is that it? No!

  • We have ~200 discriminants left: Make a custom discriminant for i32
  • What else can we do? (u8, u32) has 3 bytes left over: (u8, [u8; 7])
    • i56
    • 7-byte small string optimization (“length”, “data”, “value”, …)
  • Future consideration: We still have at least ~127 discriminants left: That’s 255 / 2
  • Future consideration: How low can we go? 32 bit? 16 bit?
  • Future consideration: Value lacks lifetime
    • Have each Agent (Nova heap) generate a unique lifetime and brand its Values with it, ensuring no cross-pollination happens
    • Hopefully: Turn Heap into Heap<’gen> and have the borrow checker help with GC through lifetime transforms

8 of 14

So what’s this got to do with DOD?

  • Data-oriented design: Slippery fish and ill-definable but
    • Know your data, design around your data first and objects (logical wholes) second
    • As everything in programming, take in moderation
  • Data in ECMAScript is the Value variants’ heap data
  • Prototypal inheritance: No multiple inheritance, inheritance chain can change during runtime
    • Prototype does not define “internal methods” interface, object type does
  • Array is an Object, ArrayBuffer is an Object but not an Array, Function is an Object, …
    • Array behaves like Object but some of its spec-defined “methods” (get, set) are different
    • Function behaves like Object but has an extra method or two
    • OOP: Array inherits Object, Function inherits Object, … Object defines a class with virtual methods, Array inherits with some overrides, …
    • DOD: How is Array actually used? Consider that first, decide how to implement it second

9 of 14

Example: Array usage in ECMAScript

  • Special property “length”: Returns the current highest index + 1 of all indexed properties in object
  • Assigning an indexed property may change length, creates holes in the array if length grows by more than 1
  • Assigning “length” drops items if length shrinks or creates holes if length grows
  • Normal usage is to push/pop/shift/unshift items; possibly create with Array(N) and assign at index, sometimes assign “length”
    • push/pop/shift/unshift functions come from prototypal inheritance of the Array.prototype Object
  • Uncommon usage is to assign non-indexed properties, change prototype
    • Do we even need the Object-like properties of Arrays normally?
  • Common usage of Objects: Assign non-indexed properties
    • Spec does special case indexed properties for Objects but only for property listing order: Do we really need Objects to prepare for indexed properties specially?

10 of 14

Array and Object implementation in Nova Engine

  • No! We do not need Object-like properties of Arrays normally!
    • Do not track non-indexed properties of Arrays
    • Do not track prototype of Arrays
  • No! We do not need indexed-properties of Objects normally!
    • Do not track “elements” of Objects
  • Objects track prototype and properties, be they indexed or non-indexed
  • Arrays track elements (indexed properties) and an Option<Object>
    • By default, an Array is only a collection of indexed properties; length is a calculated property, prototype is inferred to be Array.prototype
    • If a user changes the prototype or assigns non-indexed properties, the backing Object gets created and the changes get tracked there
    • Array is only 16 bytes in size + N * size_of(Value)

11 of 14

Where are properties saved?

  • V8 has some inline properties on Objects and Arrays, others are heap allocated with a Vec<Value> on the object
    • Inline properties often makes sense but costs memory.
  • Nova of course uses Vec<[Value; N]>! Wait what?
    • Heap contains vectors of arrays of Values
    • Array and Object contain a 12 byte ElementsVector that defines the index in Vec<[..]>, which elements vector is being used (cap; N) and how many elements are currently used (len)
  • What about property descriptors?
    • ECMAScript defines writable, enumerable, configurable, getter and setter fields for properties
    • Rarely used, so DOD says: Consider how these are used and decide based on that
  • “Sparse Vec” ie. HashMap next to Vec<[..]>
    • Normal objects and arrays never use this, so it never gets created
    • Class prototypes etc. use this, let them have it but keep it out of the way

12 of 14

What is the point? ECS!

  • Memory savings? Yes, but no
    • Cache savings!
  • All heap allocation of same type creates data in linear memory
  • Mapping through an array accesses Values in linear memory
    • This is nothing new, this is how all engines work
    • But the data of the items in the array are also likely to be in linear memory!
  • GC mark is the act of iterating through all(*) allocated objects on the heap and following their outward pointers
    • This is now mostly about traversing linear memory, doing the same access pattern on each item and queueing the found heap indexes for future work
    • Output indexes to the proper processing queues and do marking in parallel
  • GC sweep is now simply retain_mut on Heap vectors
    • Drop unmarked data, mutate marked data to account for drained data (shift indexes down)
  • What about nursery / new object space?
    • Post-GC high water mark of each vector is the nursery edge! Anything below is alive, anything above needs marking to stay alive
  • Entity-Component-System: Value-HeapData-GC

13 of 14

Nova Engine - Tomorrow

  • Still very, very rough: Only runs a few trivial ECMAScript programs
  • Lots of (grunt) work to do in implementing spec abstract operations
  • VM implementation and bytecode generation is barely started
  • Heap GC algorithm is v0.2
  • Heap needs to be made concurrent-marking proof
    • This will be hard, really hard! Single mutator, multiple reader makes it possible
    • Sweep is already stop-the-world embarrassingly-parallel
    • I want to do this mostly lock-free, meaning a lot of RCU algorithms
  • Feature flags
    • ECMAScript spec optional features
    • Safe JavaScript variants
  • Eventually: Be the most interesting ECMAScript engine on the Rust block!
    • Consider being fast as well
  • Together With You!

14 of 14

Join us!

https://github.com/trynova/nova