1 of 59

WebGPU�An Explicit Graphics API for the Web

Austin Eng, Google*�enga@google.com

*I do not officially represent Google

Many thanks to my teammates�Corentin Wallez, Kai Ninomiya, and many others at Google

2 of 59

Review: Why use explicit APIs like Vulkan?

3 of 59

Review: Why use explicit APIs like Vulkan?

  • Explicit memory management
  • Multithreading
  • Async compute
  • ...and more!

4 of 59

Texture resizing in OpenGL

User resizing texture:

  • Resize the texture
  • Use it
  • :D

Driver resizing texture:

  • Allocate new memory
  • Use new memory
  • :D

5 of 59

Texture resizing in OpenGL

User resizing texture:

  • Resize the texture
  • Use it
  • :D

Driver resizing texture:

  • Allocate new memory
    • Insert fence
    • Check the fence every frame?
    • Garbage collect memory
  • Use new memory
  • :/

6 of 59

Texture resizing in OpenGL

User resizing texture:

  • Resize the texture
  • Use it
  • :D

Driver resizing texture:

  • Allocate new memory
    • Insert fence
    • Check the fence every frame?
    • Garbage collect memory
      • Dirty uniforms passed to shaders
      • Dirty framebuffers
      • Dirty texture buffers
  • Use new memory
  • :(

7 of 59

Why: Predictable behavior and performance

Applications can:

  • Control when expensive operations happen
  • Have low variance frame timing (VR)
  • Be smarter than the OpenGL driver

8 of 59

Why: Consoles

Graphics development on console:

  • Direct access to the hardware
  • Manual memory management
  • Getting to that last 1% of performance
  • Multithreading

Developers want that on PC too.

9 of 59

Why: Multithreading

Destiny’s Multi-threaded Renderer Architecture by Natalya Tatarchuk

  • Simulation
  • Determine views�(for rendering, shadow-mapping, etc.)
  • Compute visibility�
  • Extract data for rendering
  • Generate draw calls

(decouple)

10 of 59

Command buffers enable multithreading

vkBeginCommandBuffer

vkCmdSetPipeline

vkCmdDrawArrays

vkCmdSetScissor

vkCmdDrawArrays

vkEndCommandBuffer

vkBeginCommandBuffer

vkCmdSetPipeline

vkCmdDrawArrays

vkCmdSetPipeline

vkCmdSetPushConstants

vkCmdDrawArrays

vkEndCommandBuffer

vkBeginCommandBuffer

vkCmdSetPipeline

vkCmdDrawArrays

vkEndCommandBuffer

CmdBuf1

CmdBuf2

CmdBuf3

Queue

vkQueueSubmit

Thread 1

Thread 2

Thread 3

11 of 59

Why: Multithreading

Core 1

Core 2

Core 3

Core 4

Single-threaded APIs

Graphics

Other

Core 1

Core 2

Core 3

Core 4

Multi-threaded APIs

12 of 59

Why: Async Compute

Shadow maps

Physics

Deferred Shading

G-Buffer

Transparents

PostFX

Rasterization bound

ALU bound

Memory bound

ALU bound

ALU bound

Rasterization and memory bound

13 of 59

Why: Async Compute

Shadow maps

Physics

Deferred Shading

G-Buffer

Transparents

PostFX

Rasterization bound

ALU bound

Memory bound

ALU bound

ALU bound

Rasterization and memory bound

Shadow maps

Physics

G-Buffer

Transparents

Deferred Shading

PostFX

14 of 59

Case Study: Vulkan Grass Rendering (project 6)

We almost have async compute! How can we do better?

  • Compute:
    • Apply forces
    • Update `Blade` buffer
    • Cull blades
  • Memory barrier (compute->graphics)�Waits for compute pipeline to finish.
  • Graphics: Rasterize + Tessellate

15 of 59

Case Study: Vulkan Grass Rendering (project 6)

  • Decouple physics and culling
    • Compute expensive physics for several frames in the future simultaneously
    • This step is camera-independent�
  • Compute culled blades for the next frame
  • Memory barrier (compute->graphics)�Does not wait. Blades were culled while rendering the previous frame.
  • Graphics: Rasterize + Tessellate

16 of 59

Explicit Graphics APIs on the Web�https://github.com/gpuweb/gpuweb

17 of 59

A Few Goals:

  • Security & Stability
    • A website can’t be allowed to read your data
    • Native APIs allow unsafe operations and undefined behavior
  • Portability
    • Create an API to map onto D3D12, Metal, and Vulkan
    • The Web should work the same everywhere, no matter what platform
  • Fast
    • Multithreading
    • WebAssembly
    • Web Workers

18 of 59

It’s happening, but it’s hard...

  • See Kai’s presentation to learn about the process of designing this API
  • Reaching agreement with the other browser vendors takes a lot of time and discussion

19 of 59

Dawn, a WebGPU implementation*�API overview, examples, assorted details, and cool things

https://dawn.googlesource.com/dawn

*API subject to change

20 of 59

API Overview: Resource Binding

21 of 59

Binding

Binding

Binding

Binding

Binding

Push constants

Bind Group

Constants

register

array

A binding can be any of:

  • A texture descriptor
  • A uniform buffer descriptor
  • A sampler descriptor
  • ...

Binding

Binding

Binding

Binding

22 of 59

Resource Binding

Very similar to Vulkan:

  • Pipeline layouts, composed of bind group layouts, define the structure of resource bindings for a pipeline
  • Bind groups are created from bind group layouts and contain references to resources (buffer views, texture views, etc.)
  • Bind groups are set on a pipeline when recording a command buffer

23 of 59

Resource Binding in Dawn

// Create bind group layouts

dawn::BindGroupBinding bufferBindings[] = {

{ 0, dawn::ShaderStageBit::Compute, dawn::BindingType::Sampler }, // (binding = 0) G-buffer sampler

{ 1, dawn::ShaderStageBit::Compute, dawn::BindingType::SampledTexture }, // (binding = 1) G-buffer

{ 2, dawn::ShaderStageBit::Compute, dawn::BindingType::StorageBuffer }, // (binding = 2) index buffer

{ 3, dawn::ShaderStageBit::Compute, dawn::BindingType::StorageBuffer }, // (binding = 3) vertex buffer

{ 4, dawn::ShaderStageBit::Compute, dawn::BindingType::StorageBuffer }, // (binding = 4) output color buffer

};

dawn::BindGroupLayoutDescriptor bufferBindGroupLayoutDesc { nullptr, 5, bufferBindings };

dawn::BindGroupLayout bufferBindGroupLayout = device.CreateBindGroupLayout(&bufferBindGroupLayoutDesc);

// Create other bind group layouts...

Sampler

SampledTexture

StorageBuffer

StorageBuffer

StorageBuffer

24 of 59

Resource Binding in Dawn

// Create pipeline

dawn::BindGroupLayout bindGroupLayouts[] = {

cameraBindGroupLayout, // (set = 0)

bufferBindGroupLayout, // (set = 1)

modelBindGroupLayout, // (set = 2)

};

dawn::PipelineLayoutDescriptor pipelineLayoutDesc { nullptr, 3, bindGroupLayouts };

dawn::PipelineLayout pipelineLayout = device.CreatePipelineLayout(&pipelineLayoutDesc);

dawn::ShaderModule csModule = utils::CreateShaderModule(device, dawn::ShaderStage::Compute, kComputeShaderString);

dawn::ComputePipelineDescriptor computePipelineDesc{nullptr, pipelineLayout, csModule, "main"};

dawn::ComputePipeline computePipeline = device.CreateComputePipeline(&computePipelineDesc);

cameraBindGroupLayout

bufferBindGroupLayout

modelBindGroupLayout

25 of 59

Resource Binding in Dawn

// Create camera bind group

dawn::BindGroupBinding bindings[] = {

{ 0, dawn::BindingType::BufferView, cameraBufferView },

};

dawn::BindGroupDescriptor bindGroupDesc { cameraBindGroupLayout, 1, bindings }

dawn::BindGroup cameraBindGroup = device.CreateBindGroup(&bindGroupDesc);

// Create bind groups for all models

for (Model* model : models) {

dawn::BindGroupBinding bindings[] = {

{ 0, dawn::BindingType::BufferView, model->bufferView },

{ 1, dawn::BindingType::TextureView, model->textureView },

{ 2, dawn::BindingType::Sampler, model->sampler },

};

dawn::BindGroupDescriptor bindGroupDesc { modelBindGroupLayout, 3, bindings }

model->modelBindGroup = device.CreateBindGroup(&bindGroupDesc);

}

26 of 59

Resource Binding in Dawn

// Set bind groups

dawn::ComputePassEncoder pass = builder.BeginComputePass();

pass.SetComputePipeline(computePipeline);

pass.SetBindGroup(0, cameraBindGroup);

for (ModelGroup* modelGroup : modelGroups) {

pass.SetBindGroup(1, modelGroup->bufferBindGroup);

for (Model* model : modelGroup->GetModels()) {

pass.SetBindGroup(2, model->modelBindGroup);

pass.Dispatch(1280, 960, 1);

}

}

pass.EndPass();

27 of 59

API Overview: Pipelines

28 of 59

Render / Compute Pipelines

A big object that defines fixed-function state and format of the inputs and outputs:

  • Pipeline layout (set of bind group layouts)
  • Compiled shaders��Render pipelines only:
  • Various state
    • Blending, depth, stencil, input format, etc.
  • Framebuffer attachment formats

29 of 59

Creating a Render Pipeline

// Create depth stencil state

dawn::DepthStencilStateDescriptor depthStencilStateDesc;

depthStencilStateDesc.depthWriteEnabled = true;

depthStencilStateDesc.depthCompare = dawn::CompareFunction::Less;

dawn::DepthStencilState depthStencilState =� device.CreateDepthStencilState(&depthStencilStateDesc);

// Create vertex input and attribute state

dawn::VertexAttributeDescriptor vertexAttribs[] = {

{0, 0, 0, dawn::VertexFormat::FloatR32G32B32A32},

{1, 1, 0, dawn::VertexFormat::FloatR32}};

dawn::VertexInputDescriptor vertexInputs[] = {

{0, 0, dawn::InputStepMode::Vertex},

{1, 0, dawn::InputStepMode::Instance}};

dawn::InputStateDescriptor inputStateDesc;

inputStateDesc.indexFormat = dawn::IndexFormat::UInt32;

inputStateDesc.attributes = vertexAttribs;

inputStateDesc.numAttributes = 2;

inputStateDesc.inputs = vertexInputs;

inputStateDesc.numInputs = 2;

// Create attachment states

dawn::Attachment colorAttachments[] = {{ dawn::TextureFormat::R8G8B8A8Uint }};

dawn::Attachment depthStencilAttachment { dawn::TextureFormat::D32FloatS8Uint };

dawn::AttachmentsState attachmentsState { colorAttachments, 1, depthStencilAttachment };

// Create pipeline layout

dawn::PipelineLayoutDescriptor pipelineLayoutDesc;

pipelineLayoutDesc.numBindGroupLayouts = 4;

pipelineLayoutDesc.bindGroupLayouts = bindGroupLayouts;

dawn::PipelineLayout pipelineLayout =

device.CreatePipelineLayout(&pipelineLayoutDesc);

// Create render pipeline

dawn::RenderPipelineDescriptor renderPipelineDesc;

renderPipelineDesc.vertexStage = � dawn::PipelineStageDescriptor { vsModule, "main" };

renderPipelineDesc.fragmentStage =

dawn::PipelineStageDescriptor { fsModule, "main" };

renderPipelineDesc.primitiveTopology = dawn::PrimitiveTopology::TriangleList;

renderPipelineDesc.depthStencilState = depthStencilState;

renderPipelineDesc.inputState = inputState;

renderPipelineDesc.attachmentsState = attachmentsState;

dawn::RenderPipeline pipeline =

device.CreateRenderPipeline(&renderPipelineDesc);

30 of 59

API Overview: Command Submission

31 of 59

Render/Compute Passes

  • Encode a group of commands into the command buffer�Render passes: setVertexBuffers(...), draw(...), etc.�Compute passes: dispatch(...)��Render passes:
  • Contain attachment descriptions
    • g-buffers, color buffers, etc.

32 of 59

Implicit Resource Transitions

  • Resources must not change usage within a pass�ex.) Transition from vertex to uniform buffer
  • Resources are synchronized:
    • At pass boundaries, to transition usage
    • For UAVs between dispatch() calls
  • Implicit resource transitions make application development significantly easier
  • Explicit transitions are faster, but forgetting them leads to undefined behavior

33 of 59

Example Render / Compute Passes

// Example command buffer for a particle simulation

dawn::CommandBuffer createCommandBuffer(

const dawn::RenderPassDescriptor& renderPass, � uint32_t i) {

static const uint32_t zero = 0u;

auto& bufferDst = particleBuffers[(i + 1) % 2]; // ping pong between these

dawn::CommandBufferBuilder builder = device.CreateCommandBufferBuilder();

{

dawn::ComputePassEncoder pass = builder.BeginComputePass();

pass.SetComputePipeline(computePipeline);

pass.SetBindGroup(0, bindGroups[i]); // This where bufferDst is bound for writing the particle attributes

pass.Dispatch(kNumParticles, 1, 1);

pass.EndPass();

}

{

dawn::RenderPassEncoder pass = builder.BeginRenderPass(renderPass);

pass.SetRenderPipeline(renderPipeline);

pass.SetVertexBuffers(0, 1, &bufferDst, &zero); // Bind bufferDst as a vertex buffer for particles

pass.SetVertexBuffers(1, 1, &modelBuffer, &zero);

pass.DrawArrays(3, kNumParticles, 0, 0);

pass.EndPass();

}

return builder.GetResult();

}

static uint32_t pingpong = 0;

void frame() {

dawn::CommandBuffer commandBuffer =

createCommandBuffer(renderPass, pingpong);

queue.Submit(1, &commandBuffer);

pingpong = (pingpong + 1) % 2;

}

for (uint32_t i = 0; i < 2; ++i) {// Create camera bind group

dawn::BindGroupBinding bindings[] = {

{ 0, dawn::BindingType::BufferView, simulationUniforms },

{ 1, dawn::BindingType::BufferView, bufferViews[i] },

{ 2, dawn::BindingType::BufferView, bufferViews[(i + 1) % 2] },

};

dawn::BindGroupDescriptor bindGroupDesc { bindGroupLayout, 1, bindings }

bindGroups[i] = device.CreateBindGroup(&bindGroupDesc);}

34 of 59

Implementing Timeline Fences�(simplified)

And cool things I’ve learned in my first few months about interprocess communication and GPU servicification.

35 of 59

What is a Fence?

  • A synchronization primitive used to wait for execution on the GPU to complete
  • For WebGPU, we’ve settled on “numerical fences”
    • Monotonically increasing values indicate a timestamp in GPU execution history.�Hence, the name “timeline fences”

36 of 59

What is a Fence?

queue.Submit(1, &commands1); // submit commands1

queue.Signal(fence, 1u);

queue.Submit(1, &commands2); // submit commands2

queue.Signal(fence, 2u);

queue.Submit(1, &commands3); // submit commands3

queue.Signal(fence, 3u);

// Some time later...

uint64_t completedValue = fence.GetCompletedValue();�

// Suppose completedValue == 2.

// That means that commands1 and commands2 have finished executing.�// commands3 may not have finished executing.

37 of 59

Implementing Timeline Fences in Dawn

struct Fence {

uint64_t signalValue = 0;

uint64_t completedValue = 0;

};

struct Queue {

struct SignaledFence {

Fence fence;

VkFence nativeFence;

uint64_t signalValue;

};

std::vector<SignaledFence> signaledFences;

};

void Queue::Signal(Fence fence, uint64_t signalValue) {

if (signalValue <= fence.signalValue) {

// Validation error: Fence values must

// increase monotonically

return;

}

fence.signalValue = signalValue;

VkFence nativeFence;

vkCreateFence(device, createInfo, nullptr,&nativeFence);� vkQueueSubmit(queue, 0, nullptr, nativeFence);

signaledFences.push_back(� SignaledFence{� fence, nativeFence, signalValue});

}

38 of 59

Implementing Timeline Fences in Dawn

struct Fence {

uint64_t signalValue = 0;

uint64_t completedValue = 0;

};

struct Queue {

struct SignaledFence {

Fence fence;

VkFence nativeFence;

uint64_t signalValue;

};

std::vector<SignaledFence> signaledFences;

};

void Queue::Signal(Fence fence, uint64_t signalValue) {

if (signalValue <= fence.signalValue) {

// Validation error: Fence values must

// increase monotonically

return;

}

fence.signalValue = signalValue;

VkFence nativeFence;

vkCreateFence(device, createInfo, nullptr,&nativeFence);� vkQueueSubmit(queue, 0, nullptr, nativeFence);

signaledFences.push_back(� SignaledFence{� fence, nativeFence, signalValue});

}

A Fence stores the last signaled value and the value that has completed execution on the GPU

39 of 59

Implementing Timeline Fences in Dawn

struct Fence {

uint64_t signalValue = 0;

uint64_t completedValue = 0;

};

struct Queue {

struct SignaledFence {

Fence fence;

VkFence nativeFence;

uint64_t signalValue;

};

std::vector<SignaledFence> signaledFences;

};

void Queue::Signal(Fence fence, uint64_t signalValue) {

if (signalValue <= fence.signalValue) {

// Validation error: Fence values must

// increase monotonically

return;

}

fence.signalValue = signalValue;

VkFence nativeFence;

vkCreateFence(device, createInfo, nullptr,&nativeFence);� vkQueueSubmit(queue, 0, nullptr, nativeFence);

signaledFences.push_back(� SignaledFence{� fence, nativeFence, signalValue});

}

When we signal a Fence, create a native vkFence and signal it on a queue.��Add the fence to a list of signaled fences we will check later

40 of 59

Implementing Timeline Fences in Dawn

void Queue::DoThisOccasionally() {

for (auto it = signaledFences.begin(); it != signaledFences.end();) {

if (vkGetFenceStatus(device, it.nativeFence) == VK_SUCCESS) {

// The native fence is complete. Update the completedValue

it.fence.completedValue = it.signalValue;

it = signaledFences.erase(it);

} else {

it++;

}

}

}

uint64_t Fence::GetCompletedValue() {

return completedValue;

}

Every once in a while, go through the list of all fences and update the fences that have completed.

Returns a Fence’s completedValue

41 of 59

Implementing Timeline Fences in Dawn

void Queue::DoThisOccasionally() {

for (auto it = signaledFences.begin(); it != signaledFences.end();) {

if (vkGetFenceStatus(device, it.nativeFence) == VK_SUCCESS) {

// The native fence is complete. Update the completedValue

it.fence.completedValue = it.signalValue;

it = signaledFences.erase(it);

} else {

it++;

}

}

}

uint64_t Fence::GetCompletedValue() {

return completedValue;

}

42 of 59

This doesn’t�“just work” on the Web :(

The client browser talks to our server Dawn implementation via interprocess communication using a command buffer.��The client does not run Dawn, it asks a service to execute commands.

Client

Server

fence.GetCompletedValue()

int x = fence.GetCompletedValue();

I’ll compute that and let you know in just a bit...

43 of 59

This doesn’t�“just work” on the Web :(

The client browser talks to our server Dawn implementation via interprocess communication using a command buffer.��The client does not run Dawn, it asks a service to execute commands.

Client

Server

fence.GetCompletedValue()

int x = fence.GetCompletedValue();

I’ll compute that and let you know in just a bit...

?!? This is supposed to be synchronous. What do I assign to x!?

44 of 59

Timeline Fences: Client-Side State Tracking

Client

Server

signaledValue

completedValue

fence

0

0

queue.Signal(fence, 2u);

45 of 59

Timeline Fences: Client-Side State Tracking

Client

Server

signaledValue

completedValue

fence

2

0

queue.Signal(fence, 2u);

clientQueueSignalStub(...);

serverQueueSignalStub(...);

queue.Signal(fence, 2u);

fence.onCompletion(2u, ForwardFenceValue);

46 of 59

Timeline Fences: Client-Side State Tracking

Client

Server

signaledValue

completedValue

fence

2

0

queue.Signal(fence, 2u);

clientQueueSignalStub(...);

serverQueueSignalStub(...);

queue.Signal(fence, 2u);

fence.onCompletion(2u, ForwardFenceValue);

int x = fence.GetCompletedValue(); // x <-- 0

47 of 59

Timeline Fences: Client-Side State Tracking

Client

Server

signaledValue

completedValue

fence

2

2

queue.Signal(fence, 2u);

clientQueueSignalStub(...);

serverQueueSignalStub(...);

queue.Signal(fence, 2u);

fence.onCompletion(2u, ForwardFenceValue);

int x = fence.GetCompletedValue(); // x <-- 0

handleFenceValueUpdate(...);

// Some time later...�ForwardFenceValue(fence, 2u);

48 of 59

Timeline Fences: Client-Side State Tracking

Client

Server

signaledValue

completedValue

fence

2

2

queue.Signal(fence, 2u);

clientQueueSignalStub(...);

serverQueueSignalStub(...);

queue.Signal(fence, 2u);

fence.onCompletion(2u, ForwardFenceValue);

int x = fence.GetCompletedValue(); // x <-- 0

// Some time later...ForwardFenceValue(fence, 2u);

handleFenceValueUpdate(...);

// Some time later...

int y = fence.GetCompletedValue(); // y <-- 2

49 of 59

This Client / Server separation exists for every object in Dawn.��It’s actually pretty simple, but this concept was foreign to me when I was first introduced

50 of 59

What is actually happening here?

dawn::Buffer buffer = � device.CreateBuffer(&descriptor);

buffer.SetSubData(0, 10, data);

  • The Client doesn’t have any real buffers
  • The Client asks the Server to execute commands
  • How does this code actually call buffer.SetSubData(0, 10, data);?

51 of 59

Objects in Dawn (simplified)

dawn::Buffer buffer = � device.CreateBuffer(&descriptor);

  • Get a free ObjectID* for the bind group
  • Allocate a “Buffer” Object
    • This is pretty much just �struct ClientBuffer {� uint32_t id;};
  • Tell the server to create a real bind group and map it to ObjectID
  • Return the ClientBuffer

*This is actually two ids for reasons I won’t explain

52 of 59

Objects in Dawn (simplified)

dawn::Buffer buffer = � device.CreateBuffer(&descriptor);

  • Get a free ObjectID* for the bind group
  • Allocate a “Buffer” Object
    • This is pretty much just �struct ClientBuffer {� uint32_t id;};
  • Tell the server to create a real bind group and map it to ObjectID
  • Return the ClientBuffer

  • Actually create a real Buffer
  • Map the ObjectID to the created buffer

*This is actually two ids for reasons I won’t explain

53 of 59

Objects in Dawn (simplified)

buffer.SetSubData(0, 10, data);

  • BufferSetSubDataCmd cmd {

buffer.id,

0, 10, data

};

54 of 59

Objects in Dawn (simplified)

buffer.SetSubData(0, 10, data);

  • BufferSetSubDataCmd cmd {

buffer.id,

0, 10, data

};

  • Lookup the ObjectID and get a pointer to a Buffer
  • Execute�buffer.SetSubData(0, 10, data);

55 of 59

Summary

Communicating between the Client and Server can be slow

  • Transfer as little information as possible
    • Don’t send large objects between the Client and Server
    • Use ObjectIds which give the Client a “handle” to Server objects
  • Reduce Client-Server dependencies so the Client is not blocked
    • Objects can be created and their ObjectIds used in other commands without needing to wait for the server

56 of 59

Demo :)

57 of 59

Career Advice?

58 of 59

To prepare for the future,

Don’t optimize for the future.

Tomorrow is inherently uncertain.� �Don’t pour too much energy into perfecting a future that may never occur.

59 of 59

More specifically:

  • Don’t make decisions out of fear of future regret.
  • Appreciate and enjoy the opportunities before you now.