1 of 56

New Game 2011 SF

Debugging and Optimizing WebGL Applications

Ben Vanik (benvanik@google.com)

Ken Russell (kbr@google.com)

2 of 56

First version of WebGL spec released at GDC 2011 in March

WebGL enabled by default in Firefox, Chrome, Opera 12 alpha

Safari contains Developer menu option to enable WebGL

Next version of spec coming soon

  • Tightened security, corner case behavior, some limits
  • Improved conformance suite

3 of 56

Pros and Cons of Using WebGL

Pros:

  • Directly exposes GPU capabilities to JavaScript
    • Generally consistent performance across browsers
  • Complete control over each pixel's appearance
  • Batching provides opportunity for very high performance
  • Not a plugin; integrates cleanly with other web content

Cons:

  • Much lower level than DOM, Canvas 2D APIs
  • Harder to learn
  • Harder to debug
  • Harder to optimize

4 of 56

Understanding GPUs

GPUs are stream processors

Programming model differs from CPUs

A few resources:

Fabian Giesen's A trip through the Graphics Pipeline 2011

Apple's OpenGL ES Programming Guide for iOS

  • Specifically 'Best Practices for ...'

Ken's WebGL: Hands On slide deck

5 of 56

WebGL is Awesome, but...

Sometimes performance doesn't match native OpenGL...

                            (it should be close!)

Sometimes performance doesn't match <canvas>...

                            (it should be much better!)

Sometimes things just don't work...

                            (graphics is hard!)

6 of 56

Debugging WebGL

7 of 56

Why Doesn't My Program Render?

Common situation: no output and no errors reported to console

Many common reasons

  • OpenGL errors preventing drawing
  • "Camera" pointing in the wrong direction
  • Forgot to bind texture or buffer when uploading data
  • Forgot to use the right shader program
  • Forgot to enable vertex attributes as arrays
  • Forgot to use the right texture
  • Forgot OpenGL ES rules about non power of two textures
  • A typo in your JavaScript resulted in "undefined" being passed in to WebGL at unexpected points

8 of 56

General WebGL Debugging Tips

When you are faced with a blank screen:

  • Check for OpenGL errors
  • Restart with a known good base
  • Add back in code iteratively toward your current goal

When you are debugging a shader:

  • Remove functionality
    • Watch out for vertex attributes becoming unused 
  • Output constant color in regions you're trying to identify

Use libraries and tools to triage problems

9 of 56

Getting Error Info from WebGL

webgl-debug.js

  • http://www.khronos.org/webgl/wiki/Debugging
  • Wraps a GL context and checks for errors after each call
    • Logs to the console; can change to throw exceptions
  • Enum -> string conversion utilities
  • Can simulate context loss/restore events

  var rawgl = canvas.getContext('webgl');

  gl = WebGLDebugUtils.makeDebugContext(rawgl);

  // Use gl instead of rawgl for all calls

  drawStuff();

  alert(WebGLDebugUtils.glEnumToString(

           gl.getError()));

10 of 56

Getting Error Info from WebGL

Firefox Web Console

  • about:config, set webgl.verbose = true

  • Open Web Console

11 of 56

Getting Error Info from WebGL

If you must call gl.getError() directly, be aware that errors are batched up

Ensure clear error state when trying to isolate issues:

  // Draw stuff that may set the error flag

  drawStuff();

  // Clear any set error

  while (gl.getError() != gl.NO_ERROR);

  // Draw other stuff

  drawOtherStuff();

  // Now any error is from drawOtherStuff

  var error = gl.getError();

12 of 56

Handling Context Loss

WebGL applications need to be prepared to handle loss of the rendering context

May be lost at any time for many reasons:

  • Power event on mobile device
  • Other content forces a GPU reset
  • Browser drops context on background tab
  • Browser drops context because of low resources

webgl-debug.js helps simulate lost context events to make your app more robust

See Gregg Tavares's article on Handling Lost Context

13 of 56

WebGL Inspector

  • Chrome extension for graphical WebGL debugging
  • Capture entire WebGL frames for inspection
  • Texture, buffer, and program browsers/viewing
  • Draw call state
  • Redundant call/error display

Simple embedded demo

WebGL Aquarium

14 of 56

Optimizing WebGL

15 of 56

GPU Optimization Whack-a-Mole

  • The best optimizations are often domain/target specific
    • All tips presented here prefixed with 'Usually...'
    • Your Desktop != User Desktop != Mobile != ...
  • No one-size fits all solution - not even for different parts of the same app
  • Retest often (try to automate)
  • Microbenchmarks are often not as helpful as real tests
    • GPUs are complex, rarely a single-point bottleneck
  • 15 minutes implementing a well-principled optimization can save days hunting extra perf % later

Never dismiss an optimization that has no effect when first applied - it may just mean your bottleneck (on a specific machine/browser/etc) is someplace else... for now

16 of 56

General Performance Rules

Reduce the number of draw calls

per frame

17 of 56

General Performance Rules

Use requestAnimationFrame

  • More robust framerate (vs. setInterval/setTimeout)
  • Browser can stop rendering when hidden
  • Browser can throttle if many tabs rendering

Because callbacks can be delayed (if hidden), put networking/etc on alternate timing mechanism

Always use if available (for 2D canvas too) - fallback with setTimeout 

18 of 56

General Performance Rules

Avoid get*/read* calls

  • Cause full flushes/blocks the GPU
  • Often incur expensive copies/allocations
  • Takeaway: cache state yourself in Javascript
  • Takeaway: readback only what is required and very carefully

getError

  • Never call in production! 
  • Not free anywhere
  • Multi-process renderers like Chrome can suffer greatly
    • getError blocks!
  • Takeaway: don't use webgl-debug.js outside of development

19 of 56

General Performance Rules

Avoid redundant calls

  • Best case: extra Javascript overhead
  • Worst case: will cause GPU to block (changing state/etc)

Use WebGL Inspector to find redundant calls and identify where batching can be employed

20 of 56

General Performance Rules

Disable unused GL features

  • Blending, alpha testing, etc are not always free
  • ...but don't change state excessively

Link programs infrequently

  • Shader verification/translation can take a long time
    • Worse on Windows with ANGLE
  • Create/link programs as early as possible/on load
  • Balance program complexity vs. number of programs

Don't change Renderbuffers, change Framebuffers

  • Attaching Renderbuffers requires a lot of validation

     (note this is counter to iOS perf guidelines)

21 of 56

'Graphics Pipeline Performance'

 

alpha: false

22 of 56

Optimizing Drawing

23 of 56

Drawing scenes in Canvas

  sort objects by z-index

  for each object:

    draw object

z: 0

z: 1

z: 2

z: 3

24 of 56

Drawing scenes in WebGL

  sort objects by state, then depth

  for each state:

    for each object:

      draw object

Depth buffer used to preserve

order on the screen, so same results

as Painter's, with batched states

z: 0

z: 1

z: 2

z: 3

25 of 56

Depth Buffers and Draw Order

  • Depth buffer automatically sorts geometry by depth per-pixel
    • Generally don't need to depth sort on the CPU!
  • Relatively cheap (not free, but often not a bottleneck)
  • WebGL depth buffers are usually 16-bit
    • Beware of precision issues!
    • Several well-known tricks for modifying z in vertex shaders to prevent z-fighting/jitter

  • Remember to attach a new depth buffer if doing render-to-texture/custom framebuffers
  • Remember to pass gl.DEPTH_BUFFER_BIT to gl.clear
    • Clearing is fast, ignore tips out there about inverting z
  • Depth pass (no fragment shaders) to enable better batching

Demo

26 of 56

Sorting by State

Draw objects ordered by:

  • Target framebuffer or context state
    • Blending, clipping, depth, etc
  • Program/Buffer/Texture
    • (Often) requires pipeline flush to switch
  • Uniforms/Samplers
    • Relatively cheap, modulo Javascript overhead

Sort scene ahead of time, maintain as a sorted list if possible

  • Object hierarchy walks/sorts each frame can cancel gains from batching
  • Generate content (models/etc) such that they can be easily batched (merge buffers/textures/etc)

Framebuffer/Context State

Program/Arrays

Program/Arrays

Uniforms

Uniforms

Uniforms

Uniforms

27 of 56

Batching Textures

Standard texture atlases/UV maps

  • Reduce number of server requests/load time
  • Better compression
  • Many draws can share the same texture state

Considerations: 

  • Mipmap! (note that this means power-of-two sizes)
  • Add border pixels between entries in the atlas (if using filtering) - otherwise bleeding
  • Keep sizes reasonable (no 8096x8096 textures!)
    • If 256x256, can use BYTE tex coords in vertices, etc

Demo

28 of 56

Frame Structure using Depth Buffers

gl.enable(gl.DEPTH_TEST);

gl.depthMask(true);

gl.disable(gl.BLEND);

// Draw opaque content

gl.depthMask(false);

gl.enable(gl.BLEND);

// Draw translucent content

gl.disable(gl.DEPTH_TEST);

// Draw UI

DEPTH_TEST = T

DEPTH_WRITEMASK = T

DEPTH_WRITEMASK = F

Draw opaque, Front to Back

Draw translucent, Back to Front

DEPTH_TEST = F

Draw UI, Back to Front

Frame

29 of 56

Example Sort

  • Opaque
    •  Program
      • Buffer OR z (front to back)
        • Texture
  • Translucent
    • z (back to front)
      •  Program
        • Buffer
          • Texture

Ideally # opaque >> # translucent items

If small number of translucent, no need to sort more than z

30 of 56

Depth Pass

Still shading bound? Can't get front-to-back order? Depth pass!

  • Draw all opaque objects
    • Depth write and test enabled
    • Any order (whatever is fastest on CPU)
    • Identity fragment shaders (color is ignored)
  • Clear COLOR_BUFFER only
  • Draw entire scene (opaque + translucent)
    • Depth write disabled
    • Whatever order enables best batching!

Requires use of the invariant keyword in GLSL

Enables optimal usage of depth buffer (not a single fragment executed that fails DEPTH_TEST)

31 of 56

Draw Order Guarantees

It's a little known fact that glDrawArrays and glDrawElements guarantee that the triangles in the batch are drawn in order, first to last

  • GPU's Render Output Unit (ROP)

Can use this fact to batch up independent translucent pieces of geometry, like sprites with an alpha channel, where order in batch determines z-order

Can also select one of multiple texture atlases in fragment shader to get better batching (beware dependent reads!)

See Sprite Engine prototype and readme

32 of 56

Optimizing Geometry

33 of 56

Vertex Buffer Structure

  • Reduce number of vertices
    • Use index buffers
  • Reduce per-vertex data
    • Faster to upload
    • Less data for the GPU to fetch
  • Use fewer components (XYZ, not XYZW)
  • Keep attributes aligned on natural 4-byte boundaries
  • Interleave arrays whenever possible

  • On mobile, use smaller data types
    • BYTE < SHORT < FLOAT, etc
    • Beware of performance pitfalls on certain desktop-class GPUs
    • Recommend to only do this if memory bound (mobile/netbook/etc)

34 of 56

Vertex Buffer Structure

X

Y

Z

W

R

G

B

A

S

T

R

Q

Pos:

Color:

Tex:

X

Y

Z

W

R

G

B

A

S

T

R

Q

X

Y

Z

W

R

G

B

A

S

T

R

Q

Interleaved:

Split:

X

Y

Z

W

X

Y

Z

W

R

G

B

A

...

...

R

G

B

A

S

T

R

Q

S

T

R

Q

...

...

Shrunken:

X

Y

R

G

B

A

S

T

X

Y

R

G

B

A

S

T

...

Aligned:

X

Y

S

T

R

G

B

A

X

Y

S

T

R

G

B

A

...

iOS Programming Guide on Alignment

35 of 56

Reusing Vertices with Index Buffers

gl.TRIANGLES with Vertex Buffer

gl.TRIANGLES with Vertex Buffer + Index Buffer

Index buffers enable additional GPU performance features - better caching behavior

Vertices:

0

1

2

3

4

5

0

1

2

3

4

5

Vertices:

0

1

2

1

3

2

0

1

2

3

Indices:

0

1

2

1

3

2

36 of 56

Dynamic Buffers

If need to update vertex attributes from the CPU, try to split array buffers based on update frequency

e.g., updating only position on sprites:

Ensure appropriate usage in bufferData!

X

Y

Z

W

R

G

B

A

S

T

R

Q

Pos:

Color:

Tex:

Updated every frame:

X

Y

Z

W

X

Y

Z

W

R

G

B

A

...

R

G

B

A

S

T

R

Q

S

T

R

Q

...

Updated infrequently:

bufferData usage = STREAM_DRAW

bufferData usage = STATIC_DRAW

37 of 56

Dynamic Buffers

WebGL (currently) mandates that implementations validate indices during drawElements calls

Caches of index validation results are cleared if indices are modified

Avoid updating index buffers if at all possible

38 of 56

Packing

  • If the range of the value is constrained, pack multiple values into single components
    • Pack RGB -> single float, unpack in vertex shader
    • Reduces upload/stream bandwidth at the cost of extra arithmetic
    • Can do processing offline when building models
    • Google 'gpu float packing' - lots of clever math tricks out there

Can also be used to output complex values from fragment shaders into RGBA 32-bit textures for readback

39 of 56

Optimizing Shaders

40 of 56

Compute Infrequently

Always ask yourself...

Can it be constant(ish)?

(...the answer may surprise you)

41 of 56

Compute Early

A world x viewProj matrix multiply in a vertex shader will limit your geometry stage, vs. a uniform worldViewProj matrix

A viewProj x normal vector multiply in a fragment shader will limit your shading stage, vs. a varying passed from the vertex shader

-> One javascript matrix multiply is better than 40k vertex shader multiplies (or 40k vertex vs. 2m fragment)!

per-draw

per-vertex

per-fragment

42 of 56

Compute Inexactly

  • Use the lowest precision possible in a shader
    • Lower default precision
    • Lower varying precision
    • Go as low as comfortable/allowed
    • Desktop platforms may ignore precision, be careful
    • highp is optional in fragment shaders in OpenGL ES 2
  • Use multiple programs to provide LOD if shading bound
    • Fewer/no texture samples (constant color for distant objects)
    • No lighting, skinning, etc
  • Prefer math that works, not math that is 'correct'

43 of 56

Optimize Texture Sampling

  • Use mipmapping
    • Gains quality and performance
    • Marginal memory hit (33% of total, at most)
  • Load only resolutions you need (detect screen size, etc)
    • Improves load time, memory usage, etc
  • Sample predictably in indirect lookups
    • Exploit the (often small) GPU sampling cache
    • Reorganize texture layout to match sampling pattern
    • Tightly pack data in textures
  • Use the proposed compressed texture extensions, when available

44 of 56

Dependent Reads/Instructions

GPUs are good at parallelizing fragment shaders... unless you prevent them from doing so!

Dependent read:

  void main() {

    vec2 value = texture2D(s_lookupSampler, uv).st;

    // GPU stalled waiting for value...

    gl_FragColor = texture2D(s_textureSampler,

                             value);

  }

  • Try to insert expensive math in between samples so the GPU is not idle
  • Move first sample to vertex shader if VTF supported
  • Avoid dependent reads altogether if possible

45 of 56

Cheating Fillrate Limitations

Utilize (free) browser compositor scaling

  • Decrease <canvas> width/height by some scale N
  • Increase CSS width/height by scale N

GPU fills 1/N2 as many expensive pixels, and compositor does cheap bilinear scaling up to desired resolution

See WebGL Aquarium

<canvas> width/height

CSS width/height

46 of 56

Data Flow

47 of 56

General Data Flow Rules

  • GPU hardware is massively parallel - exploit it!
    • Break out those computer architecture CS books:
    • Keep pipelines full
    • Avoid data dependencies that cause stalls
    • Avoid state changes
  • Many layers between user code and GPU
    • Assume latency, assume copies, assume conversion
    • Varies on platform/browser
  • Use the smallest data sizes possible (get clever!)
  • Read and write as little as possible
    • Hoist data from buffers up to program uniforms
    • Downsample reads
    • Use GPU filtering to write less

48 of 56

Throttling

Drivers (and certain browsers) have limited command buffer size

  • Upload too much -> spill buffer -> STALL!
  • Everything counts (textures, array buffers, commands)
  • A stall usually means a dropped frame

If loading at runtime/dynamically, limit buffer/texture uploads per frame

  • Limit on total size, not count
  • e.g., 1 1024x1024x4 texture = 16 256x256x4 textures
  • Remember that the buffer is not 100% reserved for you!

49 of 56

Data Dependencies

Separate writes from reads by as many calls as possible

  • Writes: bufferData, texImage2D, render-to-texture

Upload

Javascript

Draw

GPU/Browser

Upload

Draw

Upload

Javascript

Draw

GPU/Browser

Upload

Draw

(waiting)

Draw

Draw

Draw

Draw

Draw

Draw

Draw

Draw

Sequential upload/draw

Overlapped upload/draw

50 of 56

Data Dependencies

Double/triple-buffer

  • O(N) memory usage
  • Buffer inter-frame or, if needed, intra-frame
  • Decouple CPU from GPU
    • CPU can't feed GPU fast enough? Reuse old GPU data!
  • Adjust timestamps when computing data for future frames
    • Demo

Upload

Javascript

Draw

GPU

Upload

Draw

Draw

Draw

Upload

Upload

Upload

Upload

Draw

Draw

51 of 56

Readback

  • Avoid readback every frame
  • Read smaller resolutions if possible
    • Less time spent shipping buffers from GPU->JS
    • Less processing time spent in JS
  • Double-buffer framebuffer for read-before-write flow
    • Prevents stalls waiting for previous draws to finish

Absolutely have to do a lot of reads? Request an async readPixels extension on the mailing list!

Javascript

Read

GPU

Read

Draw

Draw

Read

(waiting)

Read

Read

Draw

Draw

Read

(waiting)

52 of 56

Conclusion

53 of 56

Conclusion

A lot of information for just one talk... but you will use it!

Can be tricky to extract maximum performance

Not all tips are applicable to every scenario, the trick is figuring out which one is

Start simple; test and debug continuously

Don't be afraid to experiment!

54 of 56

Conclusion

Please provide feedback on how tools and APIs can improve

Slides and demos available:

http://webglsamples.googlecode.com/hg/newgame/2011/index.html

55 of 56

Q&A

56 of 56

Fin