Published using Google Docs
Rendering Pipeline
Updated automatically every 5 minutes

Blink’s Rendering Pipeline

Eric Seidel, Emil A. Eklund, James Robinson, Elliott Sprehn

[please add your name if you’d like folks to bug you about what’s up in rendering]

Status as of June 10, 2013: Published to blink-dev, under discussion

Objective

Complicated web applications and large pages can feel slow in Blink, particularly on mobile.  Many of these issues can be traced to long/frequent blocks of time spent on the main thread calculating style or layout in Blink.

This document seeks to lay out both a long-term vision for how better/faster Rendering in Blink might come to look as well as short-term steps for getting there.

Problems in Rendering

Goal

Our primary goal is performance.  Right now our telemetry-based top25-sites “loading_benchmark” shows rendering accounting for on average 16% of main-thread time (and sometimes much more than that).  If we drive that to 1.6% of main thread time via this effort, I’ll consider this effort a success (and likely move onto other problems).

As part of this effort, I believe we should take this opportunity to clarify our rendering architecture to enable not only this performance improvement, but pave the way for improvements for years to come.

Rendering Today

Our Rendering system today looks something like this:

It’s a bit awkward to draw today’s rendering as a pipeline (as above).  Currently “phases” or rendering operate on shared mutable data structures, so it’s likely more useful to think of current rendering as a set of passes over the same data-structure, but I’ve drawn it such for easier comparison with below.

See Phases Of Rendering or the WebKit Technical Articles for more details on current architecture.

Making a Pipeline

Ideally Blink’s rendering code should operate with many explicit and independent phases (similar to other graphics pipelines).  Each phase should consume the output from the previous phase and produces output for the next.  These outputs could be independently cached.

We’ve had success with a similar pipeline-based approach in both the Loader and HTML Parser:

(Before decoupling HTML Tokenization from the main thread, it accounted for approximately 5% of total time.  Once we moved it to a separate thread we saw as much as 10% gains on page cyclers.  As noted above, style resolve currently accounts for 16% of main-thread time on Telemetry’s loading_benchmark, it’s possible that we will see larger than 16% gains if successful at moving Style off of the main thread.)

We can similarly re-imagine the Rendering in Blink independent phases.

A hypothetical (and incomplete) Rendering phase diagram:

The major difference in that diagram is breaking existing phases into many pieces.  Unfortunately, as drawn, the outputs of those phases need to modify their inputs (e.g.  RenderLayers set on the input RenderObjects).  However, breaking phases into smaller pieces can allow us to control when these mutations occur.

Initially a careful phase-based approach could allow us to execute the style and layout concurrently.  Longer-term (as we scale to handle even larger web applications) we may add the ability to shard large style/layout jobs across many copies of the same phase.

Getting from here to there

Our biggest obstacle between today and the dreams of this document is modularization.  The Rendering system in Blink needs to be abstracted from other systems, and the phases within rendering needs abstraction between each other.

Possible paths to better modulation may include:

A second obstacle is making the individual phases interruptible/resumable.  Right now style recalc and layout always operate on the entire document regardless of how long it takes.  (Creating a partial-tree version of style-recalc would be enough to speed up some clients such as getComputedStyle.)

Another third task is to break our two monolithic phases (style, layout) into smaller pieces.  Right now we have hackish “post attach callbacks” and actions such as “focus update” which instead of being hacked into Document::recalcStyle() could be viewed as entirely separate phases through a generalized state-machine.

Finally, real parallelism is blocked by making individual phases idempotent and their outputs immutable.  Until Style Recalc and Layout stop sharing the same output structure (the rendering tree) we can’t make them run in parallel with one another.  Style Recalc is probably the best place to start here, as it would be possible to separate the creation of the styles from assigning them into the tree.

An Alternate Approach: Just Double Buffer

This document focuses on updating each of the individual phases in Rendering to be idempotent and parallelization ready.  That’s probably useful in the long-term regardless (for architectural cleanliness, if nothing else), but may not be our fastest path to victory.  Alternatively we could keep all of the existing (shared) data structures, but instead keep two copies and atomically switch between them like a graphics system might switch between buffers.  I’m not advocating this approach, but it should be consider as a possible way to trade memory for a quicker parallel-rendering solution.

Testing

Near-term Tasks

Speculative Tasks

These are higher-risk possibly high-reward tasks.  Less clear which are on the critical path:

Future Considerations

Versioned Caching / Partial Updates

It’s possible to imagine a world in which there is versioned caching between phases, to allow parallel construction/access to phased data.  The simplest version of this model involves only two versions, and is akin to the double-buffering noted above.  (One version is being written, the other is being read from.)

If we view all of the cached data as immutable then we can do much more interesting things with our caches, including versioning.

Take for example, a style change to a single element.  That would require construction of new version chain of caches starting at the style phase.  We could start by computing the new style for that element, and recomputing the layout for its ancestor tree.  But we would not need to relayout the entire tree, and we would only need to synchronously (with a mutex) graft in the new versioned subtree once its asynchronous styling/layout updates completed.

We could transition to the example “two version” world once we have an API onto Rendering, using the wait-for-cache-refresh version everywhere first and slowly moving more and more of the Blink to use the cached-result APIs or asynchronous notification APIs where appropriate.

Application Primitives

JavaScript applications which repeatedly invalidate one of these phases (say updating lots of rows of a table in a loop), could cause pathological behavior of our speculative update mechanisms.  We would need heuristics to allow us to update these phase caches in parallel to running JavaScript, while delaying updates to make sure that we are not repeatedly discarding our speculations.

It may be possible to use JavaScript/Application primitives to help guide optimizations relating to updating these versioned subtrees.

Rendering Tree Alternatives

Once Rendering is isolated from the rest of Core by an API, there are many improvements we can make.  We can even consider replacing the core data structure “the rendering tree”.

Scene Graph

The Rendering Tree and Layer Trees are O(N) or worse for hit-testing.  They also have no (built-in) concept for occlusion mapping.  There are more advanced data structures we could add into our rendering pipeline to provide this these memory/performance tradeoffs.

Retained Mode

The Rendering Tree serves (too) many purposes.  One of them is to translate position and style information into immediate mode drawing commands.  We could be more efficient in painting if we were used a retained-mode drawing API.  This would require caching of (opaque) retained mode objects from Skia on the renderers.  Once we have a tree like this, it’s not necessarily even clear that we need to keep our own tree.  (In the same way that v8 owns the underlying V8 objects and we hang data from Blink off of them, it’s not clear that we need to “own” our “rendering tree” and that it’s possible that we should just hang data off of the compositor or skia’s tree.)

More Information

https://bugs.webkit.org/show_bug.cgi?id=111644 is an old meta-bug from webkit.org