WebGPU�An Explicit Graphics API for the Web
Austin Eng, Google*�enga@google.com
*I do not officially represent Google
Many thanks to my teammates�Corentin Wallez, Kai Ninomiya, and many others at Google
Review: Why use explicit APIs like Vulkan?
Many slides taken from Corentin’s 2016 CIS 565 guest lecture �and Kai’s 2017 CIS 565 guest lecture
Review: Why use explicit APIs like Vulkan?
Texture resizing in OpenGL
User resizing texture:
Driver resizing texture:
Texture resizing in OpenGL
User resizing texture:
Driver resizing texture:
Texture resizing in OpenGL
User resizing texture:
Driver resizing texture:
Why: Predictable behavior and performance
Applications can:
Why: Consoles
Graphics development on console:
Developers want that on PC too.
Why: Multithreading
Destiny’s Multi-threaded Renderer Architecture by Natalya Tatarchuk
(decouple)
Command buffers enable multithreading
vkBeginCommandBuffer
vkCmdSetPipeline
vkCmdDrawArrays
vkCmdSetScissor
vkCmdDrawArrays
vkEndCommandBuffer
vkBeginCommandBuffer
vkCmdSetPipeline
vkCmdDrawArrays
vkCmdSetPipeline
vkCmdSetPushConstants
vkCmdDrawArrays
vkEndCommandBuffer
vkBeginCommandBuffer
vkCmdSetPipeline
vkCmdDrawArrays
vkEndCommandBuffer
CmdBuf1
CmdBuf2
CmdBuf3
Queue
vkQueueSubmit
Thread 1
Thread 2
Thread 3
Why: Multithreading
Core 1
Core 2
Core 3
Core 4
Single-threaded APIs
Graphics
Other
Core 1
Core 2
Core 3
Core 4
Multi-threaded APIs
Why: Async Compute
Shadow maps
Physics
Deferred Shading
G-Buffer
Transparents
PostFX
Rasterization bound
ALU bound
Memory bound
ALU bound
ALU bound
Rasterization and memory bound
Why: Async Compute
Shadow maps
Physics
Deferred Shading
G-Buffer
Transparents
PostFX
Rasterization bound
ALU bound
Memory bound
ALU bound
ALU bound
Rasterization and memory bound
Shadow maps
Physics
G-Buffer
Transparents
Deferred Shading
PostFX
Case Study: Vulkan Grass Rendering (project 6)
We almost have async compute! How can we do better?
Case Study: Vulkan Grass Rendering (project 6)
Explicit Graphics APIs on the Web�https://github.com/gpuweb/gpuweb
A Few Goals:
It’s happening, but it’s hard...
Dawn, a WebGPU implementation*�API overview, examples, assorted details, and cool things
*API subject to change
API Overview: Resource Binding
Binding
Binding
Binding
Binding
Binding
Push constants
Bind Group
Constants
register
array
A binding can be any of:
Binding
Binding
Binding
Binding
Resource Binding
Very similar to Vulkan:
Resource Binding in Dawn
// Create bind group layouts
dawn::BindGroupBinding bufferBindings[] = {
{ 0, dawn::ShaderStageBit::Compute, dawn::BindingType::Sampler }, // (binding = 0) G-buffer sampler
{ 1, dawn::ShaderStageBit::Compute, dawn::BindingType::SampledTexture }, // (binding = 1) G-buffer
{ 2, dawn::ShaderStageBit::Compute, dawn::BindingType::StorageBuffer }, // (binding = 2) index buffer
{ 3, dawn::ShaderStageBit::Compute, dawn::BindingType::StorageBuffer }, // (binding = 3) vertex buffer
{ 4, dawn::ShaderStageBit::Compute, dawn::BindingType::StorageBuffer }, // (binding = 4) output color buffer
};
dawn::BindGroupLayoutDescriptor bufferBindGroupLayoutDesc { nullptr, 5, bufferBindings };
dawn::BindGroupLayout bufferBindGroupLayout = device.CreateBindGroupLayout(&bufferBindGroupLayoutDesc);
// Create other bind group layouts...
Sampler
SampledTexture
StorageBuffer
StorageBuffer
StorageBuffer
Resource Binding in Dawn
// Create pipeline
dawn::BindGroupLayout bindGroupLayouts[] = {
cameraBindGroupLayout, // (set = 0)
bufferBindGroupLayout, // (set = 1)
modelBindGroupLayout, // (set = 2)
};
dawn::PipelineLayoutDescriptor pipelineLayoutDesc { nullptr, 3, bindGroupLayouts };
dawn::PipelineLayout pipelineLayout = device.CreatePipelineLayout(&pipelineLayoutDesc);
dawn::ShaderModule csModule = utils::CreateShaderModule(device, dawn::ShaderStage::Compute, kComputeShaderString);
dawn::ComputePipelineDescriptor computePipelineDesc{nullptr, pipelineLayout, csModule, "main"};
dawn::ComputePipeline computePipeline = device.CreateComputePipeline(&computePipelineDesc);
cameraBindGroupLayout
bufferBindGroupLayout
modelBindGroupLayout
Resource Binding in Dawn
// Create camera bind group
dawn::BindGroupBinding bindings[] = {
{ 0, dawn::BindingType::BufferView, cameraBufferView },
};
dawn::BindGroupDescriptor bindGroupDesc { cameraBindGroupLayout, 1, bindings }
dawn::BindGroup cameraBindGroup = device.CreateBindGroup(&bindGroupDesc);
// Create bind groups for all models
for (Model* model : models) {
dawn::BindGroupBinding bindings[] = {
{ 0, dawn::BindingType::BufferView, model->bufferView },
{ 1, dawn::BindingType::TextureView, model->textureView },
{ 2, dawn::BindingType::Sampler, model->sampler },
};
dawn::BindGroupDescriptor bindGroupDesc { modelBindGroupLayout, 3, bindings }
model->modelBindGroup = device.CreateBindGroup(&bindGroupDesc);
}
Resource Binding in Dawn
// Set bind groups
dawn::ComputePassEncoder pass = builder.BeginComputePass();
pass.SetComputePipeline(computePipeline);
pass.SetBindGroup(0, cameraBindGroup);
for (ModelGroup* modelGroup : modelGroups) {
pass.SetBindGroup(1, modelGroup->bufferBindGroup);
for (Model* model : modelGroup->GetModels()) {
pass.SetBindGroup(2, model->modelBindGroup);
pass.Dispatch(1280, 960, 1);
}
}
pass.EndPass();
API Overview: Pipelines
Render / Compute Pipelines
A big object that defines fixed-function state and format of the inputs and outputs:
Creating a Render Pipeline
// Create depth stencil state
dawn::DepthStencilStateDescriptor depthStencilStateDesc;
depthStencilStateDesc.depthWriteEnabled = true;
depthStencilStateDesc.depthCompare = dawn::CompareFunction::Less;
dawn::DepthStencilState depthStencilState =� device.CreateDepthStencilState(&depthStencilStateDesc);
// Create vertex input and attribute state
dawn::VertexAttributeDescriptor vertexAttribs[] = {
{0, 0, 0, dawn::VertexFormat::FloatR32G32B32A32},
{1, 1, 0, dawn::VertexFormat::FloatR32}};
dawn::VertexInputDescriptor vertexInputs[] = {
{0, 0, dawn::InputStepMode::Vertex},
{1, 0, dawn::InputStepMode::Instance}};
dawn::InputStateDescriptor inputStateDesc;
inputStateDesc.indexFormat = dawn::IndexFormat::UInt32;
inputStateDesc.attributes = vertexAttribs;
inputStateDesc.numAttributes = 2;
inputStateDesc.inputs = vertexInputs;
inputStateDesc.numInputs = 2;
// Create attachment states
dawn::Attachment colorAttachments[] = {{ dawn::TextureFormat::R8G8B8A8Uint }};
dawn::Attachment depthStencilAttachment { dawn::TextureFormat::D32FloatS8Uint };
dawn::AttachmentsState attachmentsState { colorAttachments, 1, depthStencilAttachment };
// Create pipeline layout
dawn::PipelineLayoutDescriptor pipelineLayoutDesc;
pipelineLayoutDesc.numBindGroupLayouts = 4;
pipelineLayoutDesc.bindGroupLayouts = bindGroupLayouts;
dawn::PipelineLayout pipelineLayout =
device.CreatePipelineLayout(&pipelineLayoutDesc);
// Create render pipeline
dawn::RenderPipelineDescriptor renderPipelineDesc;
renderPipelineDesc.vertexStage = � dawn::PipelineStageDescriptor { vsModule, "main" };
renderPipelineDesc.fragmentStage =
dawn::PipelineStageDescriptor { fsModule, "main" };
renderPipelineDesc.primitiveTopology = dawn::PrimitiveTopology::TriangleList;
renderPipelineDesc.depthStencilState = depthStencilState;
renderPipelineDesc.inputState = inputState;
renderPipelineDesc.attachmentsState = attachmentsState;
dawn::RenderPipeline pipeline =
device.CreateRenderPipeline(&renderPipelineDesc);
API Overview: Command Submission
Render/Compute Passes
Implicit Resource Transitions
Example Render / Compute Passes
// Example command buffer for a particle simulation
dawn::CommandBuffer createCommandBuffer(
const dawn::RenderPassDescriptor& renderPass, � uint32_t i) {
static const uint32_t zero = 0u;
auto& bufferDst = particleBuffers[(i + 1) % 2]; // ping pong between these
dawn::CommandBufferBuilder builder = device.CreateCommandBufferBuilder();
{
dawn::ComputePassEncoder pass = builder.BeginComputePass();
pass.SetComputePipeline(computePipeline);
pass.SetBindGroup(0, bindGroups[i]); // This where bufferDst is bound for writing the particle attributes
pass.Dispatch(kNumParticles, 1, 1);
pass.EndPass();
}
{
dawn::RenderPassEncoder pass = builder.BeginRenderPass(renderPass);
pass.SetRenderPipeline(renderPipeline);
pass.SetVertexBuffers(0, 1, &bufferDst, &zero); // Bind bufferDst as a vertex buffer for particles
pass.SetVertexBuffers(1, 1, &modelBuffer, &zero);
pass.DrawArrays(3, kNumParticles, 0, 0);
pass.EndPass();
}
return builder.GetResult();
}
static uint32_t pingpong = 0;
void frame() {
dawn::CommandBuffer commandBuffer =
createCommandBuffer(renderPass, pingpong);
queue.Submit(1, &commandBuffer);
pingpong = (pingpong + 1) % 2;
}
for (uint32_t i = 0; i < 2; ++i) {� // Create camera bind group
dawn::BindGroupBinding bindings[] = {
{ 0, dawn::BindingType::BufferView, simulationUniforms },
{ 1, dawn::BindingType::BufferView, bufferViews[i] },
{ 2, dawn::BindingType::BufferView, bufferViews[(i + 1) % 2] },
};
dawn::BindGroupDescriptor bindGroupDesc { bindGroupLayout, 1, bindings }
bindGroups[i] = device.CreateBindGroup(&bindGroupDesc);�}
Implementing Timeline Fences�(simplified)
And cool things I’ve learned in my first few months about interprocess communication and GPU servicification.
What is a Fence?
What is a Fence?
queue.Submit(1, &commands1); // submit commands1
queue.Signal(fence, 1u);
queue.Submit(1, &commands2); // submit commands2
queue.Signal(fence, 2u);
queue.Submit(1, &commands3); // submit commands3
queue.Signal(fence, 3u);
// Some time later...
uint64_t completedValue = fence.GetCompletedValue();�
// Suppose completedValue == 2.
// That means that commands1 and commands2 have finished executing.�// commands3 may not have finished executing.
Implementing Timeline Fences in Dawn
struct Fence {
uint64_t signalValue = 0;
uint64_t completedValue = 0;
};
struct Queue {
struct SignaledFence {
Fence fence;
VkFence nativeFence;
uint64_t signalValue;
};
std::vector<SignaledFence> signaledFences;
};
void Queue::Signal(Fence fence, uint64_t signalValue) {
if (signalValue <= fence.signalValue) {
// Validation error: Fence values must
// increase monotonically
return;
}
fence.signalValue = signalValue;
VkFence nativeFence;
vkCreateFence(device, createInfo, nullptr, � &nativeFence);� vkQueueSubmit(queue, 0, nullptr, nativeFence);
signaledFences.push_back(� SignaledFence{� fence, nativeFence, signalValue});
}
Implementing Timeline Fences in Dawn
struct Fence {
uint64_t signalValue = 0;
uint64_t completedValue = 0;
};
struct Queue {
struct SignaledFence {
Fence fence;
VkFence nativeFence;
uint64_t signalValue;
};
std::vector<SignaledFence> signaledFences;
};
void Queue::Signal(Fence fence, uint64_t signalValue) {
if (signalValue <= fence.signalValue) {
// Validation error: Fence values must
// increase monotonically
return;
}
fence.signalValue = signalValue;
VkFence nativeFence;
vkCreateFence(device, createInfo, nullptr, � &nativeFence);� vkQueueSubmit(queue, 0, nullptr, nativeFence);
signaledFences.push_back(� SignaledFence{� fence, nativeFence, signalValue});
}
A Fence stores the last signaled value and the value that has completed execution on the GPU
Implementing Timeline Fences in Dawn
struct Fence {
uint64_t signalValue = 0;
uint64_t completedValue = 0;
};
struct Queue {
struct SignaledFence {
Fence fence;
VkFence nativeFence;
uint64_t signalValue;
};
std::vector<SignaledFence> signaledFences;
};
void Queue::Signal(Fence fence, uint64_t signalValue) {
if (signalValue <= fence.signalValue) {
// Validation error: Fence values must
// increase monotonically
return;
}
fence.signalValue = signalValue;
VkFence nativeFence;
vkCreateFence(device, createInfo, nullptr, � &nativeFence);� vkQueueSubmit(queue, 0, nullptr, nativeFence);
signaledFences.push_back(� SignaledFence{� fence, nativeFence, signalValue});
}
When we signal a Fence, create a native vkFence and signal it on a queue.��Add the fence to a list of signaled fences we will check later
Implementing Timeline Fences in Dawn
void Queue::DoThisOccasionally() {
for (auto it = signaledFences.begin(); it != signaledFences.end();) {
if (vkGetFenceStatus(device, it.nativeFence) == VK_SUCCESS) {
// The native fence is complete. Update the completedValue
it.fence.completedValue = it.signalValue;
it = signaledFences.erase(it);
} else {
it++;
}
}
}
uint64_t Fence::GetCompletedValue() {
return completedValue;
}
Every once in a while, go through the list of all fences and update the fences that have completed.
Returns a Fence’s completedValue
Implementing Timeline Fences in Dawn
void Queue::DoThisOccasionally() {
for (auto it = signaledFences.begin(); it != signaledFences.end();) {
if (vkGetFenceStatus(device, it.nativeFence) == VK_SUCCESS) {
// The native fence is complete. Update the completedValue
it.fence.completedValue = it.signalValue;
it = signaledFences.erase(it);
} else {
it++;
}
}
}
uint64_t Fence::GetCompletedValue() {
return completedValue;
}
This doesn’t�“just work” on the Web :(
The client browser talks to our server Dawn implementation via interprocess communication using a command buffer.��The client does not run Dawn, it asks a service to execute commands.
Client
Server
fence.GetCompletedValue()
int x = fence.GetCompletedValue();
I’ll compute that and let you know in just a bit...
This doesn’t�“just work” on the Web :(
The client browser talks to our server Dawn implementation via interprocess communication using a command buffer.��The client does not run Dawn, it asks a service to execute commands.
Client
Server
fence.GetCompletedValue()
int x = fence.GetCompletedValue();
I’ll compute that and let you know in just a bit...
?!? This is supposed to be synchronous. What do I assign to x!?
Timeline Fences: Client-Side State Tracking
Client
Server
| signaledValue | completedValue |
fence | 0 | 0 |
queue.Signal(fence, 2u);
Timeline Fences: Client-Side State Tracking
Client
Server
| signaledValue | completedValue |
fence | 2 | 0 |
queue.Signal(fence, 2u);
clientQueueSignalStub(...);
serverQueueSignalStub(...);
queue.Signal(fence, 2u);
fence.onCompletion(2u, ForwardFenceValue);
Timeline Fences: Client-Side State Tracking
Client
Server
| signaledValue | completedValue |
fence | 2 | 0 |
queue.Signal(fence, 2u);
clientQueueSignalStub(...);
serverQueueSignalStub(...);
queue.Signal(fence, 2u);
fence.onCompletion(2u, ForwardFenceValue);
int x = fence.GetCompletedValue(); // x <-- 0
Timeline Fences: Client-Side State Tracking
Client
Server
| signaledValue | completedValue |
fence | 2 | 2 |
queue.Signal(fence, 2u);
clientQueueSignalStub(...);
serverQueueSignalStub(...);
queue.Signal(fence, 2u);
fence.onCompletion(2u, ForwardFenceValue);
int x = fence.GetCompletedValue(); // x <-- 0
handleFenceValueUpdate(...);
// Some time later...�ForwardFenceValue(fence, 2u);
Timeline Fences: Client-Side State Tracking
Client
Server
| signaledValue | completedValue |
fence | 2 | 2 |
queue.Signal(fence, 2u);
clientQueueSignalStub(...);
serverQueueSignalStub(...);
queue.Signal(fence, 2u);
fence.onCompletion(2u, ForwardFenceValue);
int x = fence.GetCompletedValue(); // x <-- 0
// Some time later...�ForwardFenceValue(fence, 2u);
handleFenceValueUpdate(...);
// Some time later...
int y = fence.GetCompletedValue(); // y <-- 2
This Client / Server separation exists for every object in Dawn.��It’s actually pretty simple, but this concept was foreign to me when I was first introduced
What is actually happening here?
dawn::Buffer buffer = � device.CreateBuffer(&descriptor);
buffer.SetSubData(0, 10, data);
Objects in Dawn (simplified)
dawn::Buffer buffer = � device.CreateBuffer(&descriptor);
*This is actually two ids for reasons I won’t explain
Objects in Dawn (simplified)
dawn::Buffer buffer = � device.CreateBuffer(&descriptor);
*This is actually two ids for reasons I won’t explain
Objects in Dawn (simplified)
buffer.SetSubData(0, 10, data);
buffer.id,
0, 10, data
};
Objects in Dawn (simplified)
buffer.SetSubData(0, 10, data);
buffer.id,
0, 10, data
};
�
Summary
Communicating between the Client and Server can be slow
Demo :)
Career Advice?
To prepare for the future,
Don’t optimize for the future.
Tomorrow is inherently uncertain.� �Don’t pour too much energy into perfecting a future that may never occur.
More specifically: