The 1000/100/6 model establishes strong bounds of focus for investigating performance concerns. With a focus on load, input latency, and fidelity of animation and scrolling, we are drilled into the aspects that most directly impact the end user experience. In the audits below, we’ll investigate a few sites’ performance from this POV.
For a start-to-finish guide of using DevTools in your own perf audit see pfeldman’s sfgate.com DevTools walkthrough | Feb 2015 Paul Irish This doc is public
Protip for reading this doc: Turn off Print Layout 
|
This document is fairly long (but worth it!), so a brief table of contents…
Read on!
CNET Side-nav Slide-out
Profiled Feb 18th, Paul Irish
Open up CNet, and tap on the hamburger menu to enjoy a really painful jank during the nav slideout:

Up front is a lot of JS cost adding latency before we see any effect. Then a 200ms-long animation that appears to do layout in every frame.

Clearly, we’ve blown our Midnight Train budgets of 100ms for input latency and 6ms of work per animation frame.
Let's dig in and see what's up!
Input latency

Let's zoom to the first chunk:

We have NewRelic wrapping BugSnag wrapping setTimeout.
It's Google Adsense doing 24ms of work, every 1000ms. But in this case it fired right as my finger hit the hamburger icon.
Next, we have some mousedown handlers:

Looking closer, one of the handlers is some event delegation looking for a very specific selector ([data-user-show] which I've never spotted in their DOM). But, obviously handling this mousedown delays every touch.

Finally we get to the click handler that does the work:

The revised flame chart coloring (by file) makes it easy to see our code vs jquery.js.
Clicking into the layout we see that our layout thrashing comes from the same source line (118):

Here it is:
this.$element.on("click", function() {
e("body").toggleClass("m-view-nav"); window.scrollTo(0, 0);
})
They're using a class on the <body> to trigger the effect. Obviously this invalidates the style & layout for the entire page. :/
Then, they curiously call scrollTo(0,0); This is to jump the page back to the top of the screen because the nav, at rest, is positioned up there:

However this scrollTo(0,0) ends up forcing a 25ms layout they probably didn't need.
Layout animation
Once the "m-view-nav" class is in place, the animation gets triggered.
The nav panel gets a CSS transition on layout property left:

And the content-panel also shimmies to the left as well:

So they have two elements, independently animating with the same duration, and hopefully it looks like they move in concert.
Watch the full 60fps video of the 15fps sidenav slide.
Insights
Developer-facing insights
- Don't use event delegation for any events that can add to input latency (touchstart, touchmove, touchend, mousedown, click).
- Take inventory of all event handlers and determine if they're necessary to run then.
- Don't animate elem.style.left. Animate transforms.
- Don't use something so global as document.body.className for nav toggling as it invalidates the world
DevTools-facing insights
- Need to see when the finger lands to evaluate latency easier.
- Could use easier mechanism to match the “m-view-nav” toggleClass call with styles/elements affected.
- Could use easier mechanism to investigate JS stacks and sources repeatedly.
- We know when layout animations happen. We know when `left` is used in css transitions. We need to warn the developer.
Blink-facing insights
- How would they achieve this effect the right way?
- css transition on transform:translate on a container that's around both the content & nav
- position:fixed to keep the nav next to the content regardless of scroll position
- (added) scrollTo(0,0) should be a no-op if we’re already at 0,0
- (added) scrollTo shouldn’t force layout
CNet Scrolling
Profiled Feb 24th, Paul Irish
Test page: www.cnet.com/news/netpicks-whats-new-on-netflix-and-amazon-online-for-march-2015/
Fling it to scroll it. We'll scroll at maybe 13 FPS, if we're lucky.

We have some touch handlers that push us into slow scrolling mode.
touchstart
- [data-user-show] returns! querySelectorAll("[data-user-show]") is checked during touchstart. It takes a decent 8ms to look up.
- And it’s.. checked during all of these events, as well:

- jQuery mobile binds to touchstart. They’re greedy with their binds

- They don’t do any work during the handler, so it’s a noop.
- Youtube Instream videos records that a finger has touched the screen.
touchmove
- jQuery mobile tracks the distance moved by the finger.
touchend
- jQuery mobile tracks the finger has left the screen and issues a virtual mouseup event
Plenty, plus a lot of handlers on scroll itself. This scroll handler below was on the shorter side (21ms) of ones I’m seeing (avg 35ms).

Scroll handler walkthrough
I investigated each handler to see what they’re up to:
- jQuery scrollstart catches first, installs a timer and bails
- jQuery lazyload plugin next.
- Luckily in this case it’s disabled.
- When it’s enabled it heads down this path:

- The :visible selector check requires a getComputedStyle, and the other metrics look at scrollTop() and elem.offsetTop repeatedly
- YouTube Instream Video Ads. Runs 6ms of script.
- 6ms is a lot of time to spend in V8, and all it really winds up doing is… this.keyValues[‘touch’] = 0 + 1;
- Google Adsense is the big poppa of the bunch, and it wants to know about every scroll, too.
- They collect a bunch of metrics then postMessage it to their iframe.
- Takes ~25ms on my phone.
- Yahoo Ads
- Checks to see if an element is within the viewport. Asks for pageXOffset and innerWidth
- YouTube Instream video ads. Back again
All of these run on every scroll event. It appears that the scroll is blocked on completion of these handlers.

In-page JS profiling
NewRelic wraps around every event handler in the page and grabs Date.now() on both sides of each.
setInterval costs
Every 500ms, Yahoo Ads runs this, querying scroll and size metrics of their iframe.
var b = a.iframeRelativeScrollPosition(d);
a.sendPostMessage({
action: "scroll",
scrollTop: b[0],
scrollLeft: b[1],
height: a.iframeRelativeHeight(d),
width: a.iframeRelativeWidth(d)
});
Every 1000ms, Google Adsense runs 30-40ms of JS.
Every 1000ms, another Google Ad network runs a good amount of JS.
My favorite: a loading spinner, that’s a canvas element, is rotated with css transforms every 42ms, via setInterval.


Obviously 42.
Insights
Developer-facing insights
- Tell Google Adsense this is unacceptable. :)
- Take inventory of extra handlers and remove unnecessary ones (scrollstart, Lazyload)
- Change New Relic configuration to only add an instrumented profiler to 1% of users. (Ideally)
DevTools-facing insights
- Get developers attention with poor scrolling performance and nudge them towards what is adding most of the cost
- Summarize event handler cost by file, to easily blame 3rd party scripts
Blink-facing insights
- Element visibility API?
- Synchronous, cancelable, blocking event handlers are the worst, amirite?
- We appear to be handling the scroll event synchronously. Why?
- What else?
Time.com Scrolling
Profiled Feb 2015, Paul Irish & Pavel Feldman
Scrolling time.com/3703410/uma-thurman-face-red-carpet/ during loading on Clank is a great devil case. I regularly see it paused for 5 seconds.
I recommend you turn on “Show Touches” in Developer Options. You can leave it on permanently.

It helps you evaluate latency between Android OS receiving your touch and the result.
For this audit we won’t look at scrolling during load, but just a scroll fling, well after the page has loaded.

Looks like a lot of JS. Let’s validate its the JS causing the problem. Turn on Show Scrolling Bottlenecks:


We’ve got a mousewheel handler somewhere. Hard to see exactly where, though.
We can actually look a little deeper here.
If I record Timeline with Paint checkbox checked, we can inspect the layer tree and investigate if any have scroll scrolling behavior:

It’s telling us we’ve got at least a gallery lightbox with a touch handler. And all that pink indicates more than just that element has a slow scrolling reason.
So, Let’s see what’s going on in our blocking event handlers

6 touchstart/touchmove handlers
- evaluating visibility of article, likely for if it was read
- getsBoundingClientRect() call within each
- First, setActiveArticle(), then record it in analytics and conditionally place an ad if things are onscreen or off.
- evaluating visibility of an ad rotator
- evaluating visibility of some other things.
- seeing if the user is active. (bound to ["resize", "scroll", "touchmove"])
- call maybeLoadHiddenImages() if its time to lazy load images
- 30ms to do a very expensive jQuery DOM traversal: $("body").find("img").not("[data-loaded]")
- on every touchstart, yes.
3 scroll handlers
- exact same article visibility check.
- infinite scroll checks on viewport size and element positions to potentially load new content
- the scroll handlers are the most costly but sometimes fire before touchmove and appear to block
1 mousewheel handler
- maybeLoadHiddenImages() again.
Sayonara synchronous handlers
I wasn’t happy with all these handlers so I nuked them all to see a difference.
You can navigate all event handlers via the listing per element or use the Console to grab them and trash them.


After looking at the yellow haze of unclean slow scrolling for a while,
the event handlers are gone and the content emerges.


Before and After Heavy Input Handlers
Scrolling the page gently, the start of the touch, visible on Uma, is synchronized.
Here, we can see the impact of those handlers more clearly.
The same scroll now in Timeline…
Before:

After:

This is already at least a 2X improvement. Great stuff, but towards the end some JS comes and ruins the 60 FPS party.
Digging into these frames we have a 23ms scroll handlers and a 6ms rAF.

Are these long scroll handlers busting our frame budget repeatedly?
For simplicity I’ll remove these handlers as well:

Any better?

Not really.
There’s a lot of timer-based JS running.
On a hunch, I recorded the page at idle, not touching anything.

busy busy busy!
All this work comes from some 3rd party code, churning in the background:

And the recurring rAF is a Flash → HTML5 runtime, somewhere on this page.

Paint & Compositing
But, lets back up, all that green paint work I saw during scrolling.. 
Let’s see what it’s for.
Flip on Paint Profiler to see what we were so busy with:

We’re repainting this one ad a few frames in a row.
It’s not terribly expensive, just 3.5ms each time, but we’re paint storming on it. Not sure why.
But in addition to these paints, we spend a lot of time Compositing.

Why so much compositing cost?

Layers. Layers like mad.
The side nav that’s not on screen has 28 layers. Youch.
Finishing up
We can still do better. Let’s get to our goals.
There was one ad network firing a visibility checker every 100ms. I ripped it out and tried my scroll again:

Oh yes.
And as a last-ditch effort, I display:none’d the two layer-heavy DOM subtrees to see if I could reduce the compositing costs.

Yup. Nice reduction in my green jank. That’ll do.
So, looking at our grand before/after…
No changes, as it sits live today:
| Our improvements applied:
|
Obviously, there’s a lot less work happening in each frame. We haven’t disabled all javascript, just removed our handlers for touch, wheel and scroll. And nuked a few very costly setIntervals.
But the important thing is we’ve finally hit our Midnight Train goal of 60 FPS, and believe me, the scrolling experience feels so so good.
Update (March 18th):
The inability to scroll at the start isn’t fully an input latency concern.
The full page’s content isn’t available. Below, a 4s gap between these paints.


Inside these two frames is a bunch of JS taking up 4200ms:

The JS ends up being Backbone doing clientside templating of ads and navigation menus. Luckily, the primary article’s content was already delivered in the HTML.
Insights
Developer-facing insights
- Costly event handlers introduce latency to scroll and reduce the framerate post-fling.
- Reduce layer count if possible.
- Any flash ad that’s being transpiled to HTML5 clientside is more costly than it should be. Ask your advertisers to do it right.
Blink-facing questions/insights
- What is the best approach for handling this long-article infinite scroll setup?
- What API should ad networks polling for user activity use?
- How best to load images lazily, so they don’t influence pageload but don’t require touch/scroll handlers?
- ViewportObserver?
- Flush the document to render before issuing DOMContentLoaded?
- …?
Wikipedia Webapp Startup
We investigated the desktop experience of the wikipedia visual editor along with their engineers.
Test page: https://en.wikipedia.org/wiki/Barack_Obama?veaction=edit
Let's take a timeline.

full size
The result gives us about three equal size sections of activity. Let's look at each third as an act.
First Act
- avoid $.getVisibleText . It requires a very heavy recalculate style because its using the :hidden selector.
- Recommendation: Use your own logic to compute visibility that doesn't rely on getcomputedstyle, or defer the work.
- jQuery.filter is called via ve.dm.MWBlockImageNode.static.toDataElement. It is hugely hugely expensive.
- A closer look and it's this:
$imgWrapper = $figure.children('a, span').eq(0),
$img = $imgWrapper.children('img').eq(0),
$caption = $figure.children('figcaption').eq(0),
- DOM size in general is going to be a big overarching factor here.
- Recommendation: Kill any superfluous spans/divs you don't absolutely need
- Sizzle is repeating a support.getById call. Probably a bug. If you go straight qSA you'll avoid a lot of cost
- We're generating a lot of garbage so the GCs are huge. Lower-priority, but something to eventually chase down
- Don't use jquery.html() inside of inside of ve.dm.InternalList.convertToData. You don't need it and innerHTML is massively faster.
- Recommendation: Use innerHTML just once instead of stamping out html into DOM many many times. (not 100% you are doing it multiple times though…) either way its taking forever
Second Act
- The next 1/3 of editor initialization is mostly time inside of ve.init.Target.createSurface calls. Treewalking?
- Use console.profile() / profileEnd() to capture just this section and view in JS Profiler (heavy)
- More massive recalc styles.
- Lots of $(elem).hide() and some $(elem).css(property) coming from VeUIContext and OO.ui.ListToolGroup.
- Triggering recalcs that are 100ms each

- jQuery animations triggering recalc style
- More $(elem).hide()
- jQuery asks for computedstyle when you tell it to hide an element. It’s a problem.
- Wikpedia feedback: "yeah, there are a lot of $.show / $.hide calls that are wasteful because they're toggling the visibility of something we know is hidden or visible"
- oo.copy does a lot of JS only work. called enough to warrant micro-optimizations
- Wikpedia feedback: "oo.copy is used to deep-copy the entire VisualEditor DOM so we can round-trip it once to confirm that the editor can handle it without introducing corruption. I'm working on eliminating that entirely."
- another style recalc forced by ve.ce.Surface.showSelection, because of this.nativeSelection.addRange(nativeRange);
Third Act
- More of this ve.dm.MWReferenceNode.static.toDomElements action
- $(el).html(str) non-stop. Use innerHTML instead, try to add it to the DOM once.
- the very end OO.ui.PopupWidget.toggle ends up triggering another massive recalc style.
- you need to determine visibility on your own so that you don't require on getComputedStyle for that
Summary for Wikipedia developers
- jQuery is not your friend here.
- Eradicate :hidden , hide() and toggle() from the codebase.
- jQuery is doing too much magic around querySelectorAll. Just use it alone.
- Don't use $(elem).html(str) for DOM insertion, use innerHTML or DOM methods. Use doc fragments if necessary. Try to add to the page at once.
- Way too much forced style recalc and layout. Track down all the reasons why.
- Track visibility on your own, do not use getComputedStyle to ask.
- Don't build the DOM just to hide() it.
- DOM size is too big.
- If you can't kill the recalc styles need to isolate the recalc costs better. Iframes? :/
Update (Feb 25th), Wikipedia dev team hard at work:

Here, Wikipedia is annotating the Timeline recording's phases of load with post-its and markers.
How does one print out the timeline? With Wikipedia, we found a creative solution: Inspect the inspector, use Device Mode in desktop mode, stretch viewport to 9000px wide, grab the Timeline's <canvas>, toDataURL() it, set to img[src], download, print.
Update (March 19), Plenty of fixes are in
https://phabricator.wikimedia.org/project/board/831/query/JnI8_pet.gGL/ is tracking the eng team’s progress. They’re already made large and substantial improvements. I’ve seen the performance speed up by over 2x already.
Addy Osmani’s recent talk from jQuery UK covers this story as well.
DevTools Insights
Working through this example reveals a lot of ways in which we could help to communicate everything above without an expert walking through the timeline.
Communicating top costs
- We need a view of the overall performance so we don't go to micro immediately.
- Need a summary similar to the timeline summary tooltip
- Developers have JSPerf for the micro picture. If they're profiling, they want the big picture
Connecting back to source
- We get big callstacks thanks to libraries & frameworks.
- fade blackboxed frames in flame chart. easier view in callstack
- Need better association between details frame callstack and the flame visualization.
- We need range selection for Heavy profiler view.
- highlight on hover
- Things like $figure.children('a, span') don't feel expensive but we need to show they are.
- Show original source lines of the key frame from callstack?
- While hovering callstack, show the relevant line of source
- Heatmap potentially useful during authoring. Use previously collected perf data to annotate.
Profiler experience
- What is the "Heavy" equivalent of timeline?
- Forced style recalcs: 30% of main thread time
- Pure JS (not ending in purple/blue): 30% of main thread time
- Event handlers (blocking): wheel/touch.
- Event handlers (non-blocking): Mutation events
- Stack view for each of these, blackboxing jQuery etc.
- Gather all stacks for Recalc/Layout/etc , reverse them, magically match them, Then summarize with total amount of work caused by each call.
- Prioritize forced stack frames. Guide them to address layout trashing. Only draw developers towards invalidation if they need it to re-order.
Buggy stuff
- Bug identified: insertBefore call not part of our invalidation call stack
- Timeline Frames view gets confused. When we're not frequently spitting out frames, each vertical frame bar can capture 1-2s of work.
- We should use frames view when reasonable (rAF, animation or scrolling) and use regular otherwise.
- Could they be on the same line?
Additional feedback from Wikipedia (March 18)
- V8 Deopts are loud and demand priority however they rarely deliver significant improvements in overall performance.
- DevTools Audit panel is not good.
- It recommended "3 inline script blocks were found in the head between an external CSS file and another resource. To allow parallel downloading, move the inline script before the external CSS file, or after the next resource. "
- Wikipedia followed the advice with no improvement to be found.
- Printing out the timeline to investigate was incredibly effective.
Want More?
Other recent perf audits:
Angular app input latency rendering form fields
Angular’s dominance of the runloop means the developer has few options, aside from large refactors.
Not only does it load in 80 scripts, connect to 80 origins, create 70 frames, and load 5MB of content… but, as you can imagine, it thrashes the main thread in no particular pattern, so there's not much fruit hanging low on these sfgate trees.
Greg Simon said this destroyed his Chrome Pixel. Indeed, this particular site is a great stress-test for our entire platform.
Yelp published a blog post, Animating the Mobile Web, describing how they got to 60 FPS.
We dug in to look a bit closer.
Bonus audit!
Scrolling Jank in Google Play Movies Infinite Scroll
Profiled by Paul Irish & Paul Lewis, Feb 27th.
Viewing https://play.google.com/store/movies/category/1/collection/topselling_paid on desktop.
1 second long scrolling janks when the infinite scroll XHRs returns and generates DOM.

The XHR handler should definitely yield instead of creating all their DOM immediately.
But while they do their work in one go, the mousewheel handler means the user scroll is blocked.
Mystery cost, post XHR
At the end of our Timeline recording are these gray towers.

This points to work that DevTools doesn’t have instrumented. But tracing likely does…

The culprit here: CompositingRequirementsUpdater::updateRecursive

As a hunch, we thought this cost was ballooning with layer count.
Layers
First you open up Layers panel and you’re thinking.. oh a layer per card. That seems excessive. 
| But then you tilt it and see we have 3 unique layers per card. - 2 at the card location, one of which is manually promoted with backface-visibility, the other layerForSqashingContents
- A third layer per-card is the full width and height of the document. (layerForAncestorClip)

|
Wheel handler detour
While we’re here, that red looking layer is the one with the wheel handler. But oddly it’s showing up behind the root layer, which shouldn’t probably be possible. In fact, if we look at google.com:

So let’s park this, because it’s just how Chrome’s internals are set up that there’s a wheel handler.
Lets nuke those layers
If we drop backface-visibility from the card, our layer count drops to 1.
And we’ve vanquished the timeline’s gray towers of layer management.

However, we have a new problem.

Every time we render a new card to the page we need to re-rasterize the entire viewport.
Given that, it’s no surprise Painting 1450 x 12,000 pixels takes 80ms each frame.
For fun, I used DevTools LiveEdit to edit the XHR handler’s JavaScript live and put a wrapper around each new “page” of elements, which I then promoted with will-change:transform.
Now our layer creation and repaint is synchronized with the DOM changes.
It ended up making a great improvement:

So did we improve the jank, post XHR?
| 3 layers per card (original) | a single layer | a few layers to group card additions |
Mega Paints (length of large paint operations) | 130ms | 121ms | 30ms |
Post-load frame times | 220, 220, 230, 200 | 310, 210, 110, 100 | 200, 75, 70, 50 |
Looks like a win.
XHR
Remember that big XHR? Upon further inspection, it interesting:

Developer Insights
- Don’t ask for element/window metrics while/after adding new content to DOM
- Reduce layer count to minimize overhead
DevTools Insights
- Need a global layer count
- Need to indicate what actions to take given a particular big cost
- e.g. Composite Layers == reduce layer count. Investigate current ones via frame viewer and evaluate compositing reasons
Blink Insights
- scrollTop is asked for often and requires significant work. Do we NEED to force a layout for all cases?
- Why are we generating two superfluous layers per movie card?
- Optional: infinite scrolling with single layer can add up to significant re-rasterize costs. Can we chunk out the primary layer so the cost isn’t so large?