New Game 2011 SF
Debugging and Optimizing WebGL Applications
Ben Vanik (benvanik@google.com)
Ken Russell (kbr@google.com)
First version of WebGL spec released at GDC 2011 in March
WebGL enabled by default in Firefox, Chrome, Opera 12 alpha
Safari contains Developer menu option to enable WebGL
Next version of spec coming soon
Pros and Cons of Using WebGL
Pros:
Cons:
Understanding GPUs
GPUs are stream processors
Programming model differs from CPUs
A few resources:
Fabian Giesen's A trip through the Graphics Pipeline 2011
Apple's OpenGL ES Programming Guide for iOS
Ken's WebGL: Hands On slide deck
WebGL is Awesome, but...
Sometimes performance doesn't match native OpenGL...
(it should be close!)
Sometimes performance doesn't match <canvas>...
(it should be much better!)
Sometimes things just don't work...
(graphics is hard!)
Debugging WebGL
Why Doesn't My Program Render?
Common situation: no output and no errors reported to console
Many common reasons
General WebGL Debugging Tips
When you are faced with a blank screen:
When you are debugging a shader:
Use libraries and tools to triage problems
Getting Error Info from WebGL
webgl-debug.js
var rawgl = canvas.getContext('webgl');
gl = WebGLDebugUtils.makeDebugContext(rawgl);
// Use gl instead of rawgl for all calls
drawStuff();
alert(WebGLDebugUtils.glEnumToString(
gl.getError()));
Getting Error Info from WebGL
Firefox Web Console
Getting Error Info from WebGL
If you must call gl.getError() directly, be aware that errors are batched up
Ensure clear error state when trying to isolate issues:
// Draw stuff that may set the error flag
drawStuff();
// Clear any set error
while (gl.getError() != gl.NO_ERROR);
// Draw other stuff
drawOtherStuff();
// Now any error is from drawOtherStuff
var error = gl.getError();
Handling Context Loss
WebGL applications need to be prepared to handle loss of the rendering context
May be lost at any time for many reasons:
webgl-debug.js helps simulate lost context events to make your app more robust
See Gregg Tavares's article on Handling Lost Context
WebGL Inspector
Optimizing WebGL
GPU Optimization Whack-a-Mole
Never dismiss an optimization that has no effect when first applied - it may just mean your bottleneck (on a specific machine/browser/etc) is someplace else... for now
General Performance Rules
Reduce the number of draw calls
per frame
General Performance Rules
Because callbacks can be delayed (if hidden), put networking/etc on alternate timing mechanism
Always use if available (for 2D canvas too) - fallback with setTimeout
General Performance Rules
Avoid get*/read* calls
getError
General Performance Rules
Avoid redundant calls
Use WebGL Inspector to find redundant calls and identify where batching can be employed
General Performance Rules
Disable unused GL features
Link programs infrequently
Don't change Renderbuffers, change Framebuffers
(note this is counter to iOS perf guidelines)
'Graphics Pipeline Performance'
alpha: false
Optimizing Drawing
Drawing scenes in Canvas
sort objects by z-index
for each object:
draw object
(aka Painters Algorithm)
z: 0
z: 1
z: 2
z: 3
Drawing scenes in WebGL
sort objects by state, then depth
for each state:
for each object:
draw object
Depth buffer used to preserve
order on the screen, so same results
as Painter's, with batched states
z: 0
z: 1
z: 2
z: 3
Depth Buffers and Draw Order
Sorting by State
Draw objects ordered by:
Sort scene ahead of time, maintain as a sorted list if possible
Framebuffer/Context State
Program/Arrays
Program/Arrays
Uniforms
Uniforms
Uniforms
Uniforms
Batching Textures
Standard texture atlases/UV maps
Considerations:
Frame Structure using Depth Buffers
gl.enable(gl.DEPTH_TEST);
gl.depthMask(true);
gl.disable(gl.BLEND);
// Draw opaque content
gl.depthMask(false);
gl.enable(gl.BLEND);
// Draw translucent content
gl.disable(gl.DEPTH_TEST);
// Draw UI
DEPTH_TEST = T
DEPTH_WRITEMASK = T
DEPTH_WRITEMASK = F
Draw opaque, Front to Back
Draw translucent, Back to Front
DEPTH_TEST = F
Draw UI, Back to Front
Frame
Example Sort
Ideally # opaque >> # translucent items
If small number of translucent, no need to sort more than z
Depth Pass
Still shading bound? Can't get front-to-back order? Depth pass!
Requires use of the invariant keyword in GLSL
Enables optimal usage of depth buffer (not a single fragment executed that fails DEPTH_TEST)
Draw Order Guarantees
It's a little known fact that glDrawArrays and glDrawElements guarantee that the triangles in the batch are drawn in order, first to last
Can use this fact to batch up independent translucent pieces of geometry, like sprites with an alpha channel, where order in batch determines z-order
Can also select one of multiple texture atlases in fragment shader to get better batching (beware dependent reads!)
See Sprite Engine prototype and readme
Optimizing Geometry
Vertex Buffer Structure
Vertex Buffer Structure
X
Y
Z
W
R
G
B
A
S
T
R
Q
Pos:
Color:
Tex:
X
Y
Z
W
R
G
B
A
S
T
R
Q
X
Y
Z
W
R
G
B
A
S
T
R
Q
Interleaved:
Split:
X
Y
Z
W
X
Y
Z
W
R
G
B
A
...
...
R
G
B
A
S
T
R
Q
S
T
R
Q
...
...
Shrunken:
X
Y
R
G
B
A
S
T
X
Y
R
G
B
A
S
T
...
Aligned:
X
Y
S
T
R
G
B
A
X
Y
S
T
R
G
B
A
...
iOS Programming Guide on Alignment
Reusing Vertices with Index Buffers
gl.TRIANGLES with Vertex Buffer
gl.TRIANGLES with Vertex Buffer + Index Buffer
Index buffers enable additional GPU performance features - better caching behavior
Vertices:
0
1
2
3
4
5
0
1
2
3
4
5
Vertices:
0
1
2
1
3
2
0
1
2
3
Indices:
0
1
2
1
3
2
Dynamic Buffers
If need to update vertex attributes from the CPU, try to split array buffers based on update frequency
e.g., updating only position on sprites:
Ensure appropriate usage in bufferData!
X
Y
Z
W
R
G
B
A
S
T
R
Q
Pos:
Color:
Tex:
Updated every frame:
X
Y
Z
W
X
Y
Z
W
R
G
B
A
...
R
G
B
A
S
T
R
Q
S
T
R
Q
...
Updated infrequently:
bufferData usage = STREAM_DRAW
bufferData usage = STATIC_DRAW
Dynamic Buffers
WebGL (currently) mandates that implementations validate indices during drawElements calls
Caches of index validation results are cleared if indices are modified
Avoid updating index buffers if at all possible
Packing
Can also be used to output complex values from fragment shaders into RGBA 32-bit textures for readback
Optimizing Shaders
Compute Infrequently
Always ask yourself...
Can it be constant(ish)?
(...the answer may surprise you)
Compute Early
A world x viewProj matrix multiply in a vertex shader will limit your geometry stage, vs. a uniform worldViewProj matrix
A viewProj x normal vector multiply in a fragment shader will limit your shading stage, vs. a varying passed from the vertex shader
-> One javascript matrix multiply is better than 40k vertex shader multiplies (or 40k vertex vs. 2m fragment)!
per-draw
per-vertex
per-fragment
Compute Inexactly
Optimize Texture Sampling
Dependent Reads/Instructions
GPUs are good at parallelizing fragment shaders... unless you prevent them from doing so!
Dependent read:
void main() {
vec2 value = texture2D(s_lookupSampler, uv).st;
// GPU stalled waiting for value...
gl_FragColor = texture2D(s_textureSampler,
value);
}
Cheating Fillrate Limitations
Utilize (free) browser compositor scaling
GPU fills 1/N2 as many expensive pixels, and compositor does cheap bilinear scaling up to desired resolution
See WebGL Aquarium
<canvas> width/height
CSS width/height
Data Flow
General Data Flow Rules
Throttling
Drivers (and certain browsers) have limited command buffer size
If loading at runtime/dynamically, limit buffer/texture uploads per frame
Data Dependencies
Separate writes from reads by as many calls as possible
Upload
Javascript
Draw
GPU/Browser
Upload
Draw
Upload
Javascript
Draw
GPU/Browser
Upload
Draw
(waiting)
Draw
Draw
Draw
Draw
Draw
Draw
Draw
Draw
Sequential upload/draw
Overlapped upload/draw
Data Dependencies
Double/triple-buffer
Upload
Javascript
Draw
GPU
Upload
Draw
Draw
Draw
Upload
Upload
Upload
Upload
Draw
Draw
Readback
Absolutely have to do a lot of reads? Request an async readPixels extension on the mailing list!
Javascript
Read
GPU
Read
Draw
Draw
Read
(waiting)
Read
Read
Draw
Draw
Read
(waiting)
Conclusion
Conclusion
A lot of information for just one talk... but you will use it!
Can be tricky to extract maximum performance
Not all tips are applicable to every scenario, the trick is figuring out which one is
Start simple; test and debug continuously
Don't be afraid to experiment!
Conclusion
Please provide feedback on how tools and APIs can improve
Slides and demos available:
http://webglsamples.googlecode.com/hg/newgame/2011/index.html
Q&A
Fin