1 of 27

WebGL Next Investigations

Corentin Wallez & Kai Ninomiya, Google

2 of 27

Recap: Goals for WebGL Next

  • Security
  • Performance
  • Portability

3 of 27

Goal: Security

Fully validated for security reasons. In particular the API needs to validate or automatically handle:

  • Object lifetime
  • Preventing uninitialized reads
  • Undefined behaviors leading to privacy issues

4 of 27

Goal: Performance

Runs fast (80% of native).

Validation done ahead of time to reduce the cost compared to WebGL.

Support some concepts necessary for some APIs that might be noop on others.

5 of 27

Goal: Portability

Runs on top of at least D3D12, Metal and Vulkan.

Like GL, the API needs to be network transparent to work with some browsers’ multi-process architecture.

Expose both to Javascript and WebAssembly.

Undefined behaviors are not portable; use up to 20% of the performance to remove as many as possible.

6 of 27

Investigation: Obvious constraints

  • Needs pipeline states and command buffers to reach good performance on D3D12, Metal, Vulkan.
  • Textures and buffers can’t be resizable as that is only supported in GL.

7 of 27

Investigation: Binding model

How you give resources (buffers, textures, ...) to the shaders:

  • Metal (and D3D11): linear table of textures, buffers…
  • D3D12: ranges of descriptor heap bound as the “root arguments” (or single descriptors as arguments)
  • Vulkan: sets of resource descriptors bound together

In D3D12 and Vulkan, the root signature / pipeline layout can be shared between pipelines.

8 of 27

Investigation: Binding model

Vulkan is the limiting factor:

  • D3D12’s binding model is a strict superset of Vulkan’s.
  • Emulating table-based binding models on D3D12 and Vulkan will be slow.

WebGL Next needs to use a descriptor-set like approach.

  • Will also help with CPU overhead as the application provides more information in advance.

9 of 27

Investigation: Data uploads / downloads

WebGL developers complain about upload speed. Here’s what happens in Chromium in WebGL:

  • Application creates a typed array
  • gl.texImage2D copies it in a shared-memory bucket
  • GPU process copies in a driver buffer in glTexImage2D
  • The driver schedules a copy to device memory

10 of 27

Investigation: Data uploads / downloads

Mapping buffers is possible in browsers and safe because ArrayBuffers can be neutered.

Vulkan 1.1 will allow getting a file descriptor from a VkMemory, will allow 1-copy path.

Drawbacks: doesn’t take advantage of pipelined copies or D3D12_TEXTURE_LAYOUT_64KB_STANDARD_SWIZZLE.

11 of 27

Investigation: Target API limitations

D3D11 has limitations that make it inefficient to support:

  • No buffer <-> texture copies.
  • Texture buffers would need shadow copies.

12 of 27

Investigation: Target API limitations

Likewise it seems we’ll need a very recent OpenGL as we need ARB_texture_view (core in 4.3).

OpenGL and extensions only have combined texture and samplers: the texture and sampler are always given together to the shader (hence the name Sampler2D). In all other APIs, the sampler and texture are given separately and combined in the shader.

13 of 27

Chromium’s WebGL Next prototype

Chromium has built a prototype to experiment with API designs.

The goal is to explore design directions and spark discussions, not to force decisions early on.

Being a prototype, it lacks many key features. We have focused on testing end2end usage, instead of exploring the breadth of the API.

14 of 27

Chromium’s WebGL Next prototype

Key properties of our prototype:

  • It is based on an investigation into the high-level design of all 5 APIs but does not follow one in particular.
  • There is an ANGLE-like library implementing that design.
  • We did initial integration of the library in Chromium, exposing the bindings to Javascript.

15 of 27

Features of the prototype

Our prototype currently supports:

  • OpenGL (standalone and in Chromium)
  • Metal (standalone only)

  • Render and compute pipelines, command buffers
  • Textures and vertex / index / uniform / storage buffers
  • Descriptor sets (bindgroups) and push constants
  • Uses SPIRV as it already has translators to GLSL and MSL

16 of 27

Examples (subject to change)

vsModule = nxt.createShaderModuleBuilder()

.setSource(vsCode.length, vsCode)

.getResult();

fsModule = nxt.createShaderModuleBuilder()

.setSource(fsCode.length, fsCode)

.getResult();

pipeline = nxt.createPipelineBuilder()

.setStage(nxt.SHADER_STAGE_VERTEX, vsModule, "main")

.setStage(nxt.SHADER_STAGE_FRAGMENT, fsModule, "main")

.getResult();

commands = nxt.createCommandBufferBuilder()

.setPipeline(pipeline)

.drawArrays(3, 1, 0, 0)

.getResult();

// Each requestAnimationFrame

queue.submit(1, [commands]);

ctx.flush();

HelloTriangle in JS

uniformBuffer = nxt.createBufferBuilder()

.setUsage(nxt.BUFFER_USAGE_BIT_UNIFORM | nxt.BUFFER_USAGE_BIT_MAPPED)

.setSize(128)

.getResult();

var uniformView = uniformBuffer.createBufferViewBuilder()

.setExtent(0, 128)

.getResult();

var sampler = nxt.createSamplerBuilder()

.setFilterMode(nxt.FILTER_MODE_LINEAR, nxt.FILTER_MODE_LINEAR,

nxt.FILTER_MODE_LINEAR)

.getResult();

var texture = nxt.createTextureBuilder()

.setDimension(nxt.TEXTURE_DIMENSION_2D)

.setExtent(512, 512, 1)

.setFormat(nxt.TEXTURE_FORMAT_R8_G8_B8_A8_UNORM)

.setMipLevels(1)

.setUsage(nxt.TEXTURE_USAGE_BIT_SAMPLED | nxt.TEXTURE_USAGE_BIT_TRANSFER_DST)

.getResult();

textureView = texture.createTextureViewBuilder()

.getResult();

Creating resource in JS

17 of 27

Examples (subject to change)

nxt::BindGroupLayout bgl = device.CreateBindGroupLayoutBuilder()

.SetBindingsType(nxt::ShaderStageBit::Compute,

nxt::BindingType::StorageBuffer, 0, 1)

.GetResult();

nxt::PipelineLayout pl = device.CreatePipelineLayoutBuilder()

.SetBindGroupLayout(0, bgl)

.GetResult();

computePipeline = device.CreatePipelineBuilder()

.SetLayout(pl)

.SetStage(nxt::ShaderStage::Compute, module, "main")

.GetResult();

computeBindGroup = device.CreateBindGroupBuilder()

.SetLayout(bgl)

.SetUsage(nxt::BindGroupUsage::Frozen)

.SetBufferViews(0, 1, &view)

.GetResult();

nxt::CommandBuffer commands = device.CreateCommandBufferBuilder()

.SetPipeline(computePipeline)

.SetBindGroup(0, computeBindGroup)

.Dispatch(1, 1, 1)

.SetPipeline(renderPipeline)

.SetBindGroup(0, renderBindGroup)

.DrawArrays(3, 1, 0, 0)

.GetResult();

queue.Submit(1, &commands);

Bind groups in C++

Compute and graphics in C++

18 of 27

Object creation in the prototype

We chose to have object initialization done through builder objects that gather the initialization parameters.

  • Removes the need for “is built” validation checks, places all validation in the builder.
  • Backend objects can choose to forget parameters
  • Default values are easily supported (compared to structs)
  • Has a nice fluent API style (subjective)

19 of 27

Command-buffer in the prototype

Like WebGL, it is designed to work through a command buffer:

  • A “wire server” can be created from an API instance
  • A “wire client” provides an API instance
  • Errors are handled asynchronously
  • The wire does lifetime validation using OpenGL-like IDs but it could also be done outside of it.

20 of 27

Error handling in the prototype

On the client side, every object returned from the API acts like a Promise<object or error>. These can be used directly, but if an object resolving to an error is used as a parameter:

  • Functions result in a noop.
  • Functions returning an object return an error value.
  • Builder methods mark the builder as an error value.

21 of 27

Error handling in the prototype

Asynchronous error handling is similar to what is done in WebGL, but more structured.

Asynchronous error handling will allow putting the “client” and the “server” in different threads for overlapped execution in both single and multi process browsers.

22 of 27

Architecture of the prototype

Heavy reliance on code generation improves iteration speed but reduces flexibility in API shape (for now).

All state tracking and validation is in common code:

  • src/backend/common is 2700loc
  • src/backend/opengl is 1000loc
  • src/backend/metal is 1000loc

23 of 27

Architecture of the prototype, standalone

CppHelloTriangle

nxtcpp.cpp

nxt.c

Wire Serializer

Wire Deserializer

Generated validation and backend binding

Common code and OpenGL backend

Through some fake command buffer

queue->Submit(1, &commands);

nxtQueueSubmit(queue, 1, &cCommands);

procs.queueSubmit(queue, 1, &cCommands);

Serialize in wire::QueueSubmitCmd

Deserialize from wire::QueueSubmitCmd

backendProcs.queueSubmit(q, 1, &commands);

Call opengl::Queue::Submit

Stuff

Generated

Hand written

24 of 27

Architecture of the prototype, in Chrome

Javascript

modules/nxt/gen/NXT.cpp

Wire Serializer

Wire Deserializer

Generated validation and backend binding

Common code and OpenGL backend

Goes through the GPU command buffer

queue->submit(1, [commands]);

getProcs()->queueSubmit(queue, 1, &cCommand);

Serialize in wire::QueueSubmitCmd

Deserialize from wire::QueueSubmitCmd

backendProcs.queueSubmit(q, 1, &commands);

Call opengl::Queue::Submit

Stuff

When flush is called, call

contextGL()->NXTCommands(serializeSize, serializeBuffer);

GLES2CmdDecoder::HandleNXTCommands

Generated

Hand written

25 of 27

Demo time!

  • Compute + graphics demo
    • In JS in Chromium with OpenGL
    • Standalone with OpenGL
    • Standalone with Metal
  • Model viewer demo

26 of 27

Acknowledgments

  • Kai Ninomiya
  • Justin Novosad
  • Ken Russell
  • Antoine Labour
  • Jesse Hall
  • glslang, shaderc and SPIRV-Cross

27 of 27

Questions?