WebPerfWG call - September 24th 2020

Participants

Nic Jansma, Yoav Weiss, Peter Perlepes, Neil Craig, Noam Helfman, Gilles Dubuc, Marcel Duran, Steven Bougon, Nicolás Peña Moreno, Timo Tijhof, Michal Mocny, Thomas Kelly, Sean Feng, Carine Bournez, Alex Christensen, Annie Sullivan, Benjamin de Kosnik,

Next Call: October 8th @ 10am PST / 1pm EST

TPAC

Is coming!
Agenda coming soon
Please send your presentation ideas to Nic or Yoav

Minutes

worker start needs to be added to diagram

related: workerStart should be clearly defined as applicable to the last SW · Issue #100 · w3c/navigation-timing
Nic: a result of the hackathon that turned out bigger than expected
… PR - WIP: workerStart and redirects · Issue #131 · w3c/navigation-timing
… Got a request to add workerStart to the navigation timing diagram
… Service worker start up time happens before the fetch of the last resource
… Found a free things that wanted to make sure everyone agrees with
… SW start up time happens right before the fetch
… Need to make that more obvious in the diagram, maybe add another phase
Noam: that implicates that workerStart starts before the fetch?
Nic: yeah, and the time between them represents the time the worker took to start up
… that one is low controversy
… There’s a bug in the processing model where in case of redirects workerStart is reflecting the time of the cross-origin worker if one existed. This PR fixes that
… Also workerStart itself is not same-origin protected, because it’s part of the final origin you land on
Yoav: One tricky case: If we’re going from A => B => C=> B, where B has a SW, would we expose B’s SW start time, and hence details on C?
Nic: For multiple redirects, the SW start up time will always be the first request of the last same-origin redirect chain. In your example it’d probably be a 0ms startup time.
Yoav: I agree that it should be 0.
Timo: The redirect start and end time exposes something similar, right?
Nic: The redirects are protected by the same origin check. Any redirects off origin zeroize them
Timo: So redirects on the chain on the same-origin won’t be visible if cross-origin redirect exist?
Nic: Correct
Timo: but not for SW?
Nic: that’s the proposal here.
...If we set redirect start for the same origin chain, you could infer the time the cross-origin redirects took. fetchStart enables some inference but with less confidence, because you’re not aware of the redirects.
Timo: Should we have a second box in the diagram? That would be helpful. At Wikimedia we added an extra box called “gaps”.
Marcel: question on the box between “redirect” and “AppCache”. workerStart could mean different things. Does the state of the SW matter in the redirect chains?
...In theory the fetchStart could be calculated by getting the timestamp from the fetch event, no?
Nic: For Navigation Timing we don’t have access to the fetch() API for those requests.
Marcel: If the fetch start should be the navigation timing, the SW is triggering a fetch event which would be the time here, at least inside the SW context
… workerStart always had a huge gap between “worker already running” and “worker is starting” times and they mean different things.
Nic: There’s a “worker start time” and “worker about to fetch” time and if the difference between them is very large the worker is being spun up, but we’re not explicitly exposing that.
… Are you arguing for more clarity?
Marcel: If we introduce another box in between, what’s the point of it? It’s a single metric without an end
Nic: I imagined fetchStart to be the end of that box
Marcel: OK, related to the redirect chain, if all the SW’s involved don’t fetch anything, should we only count the last one?
Nic: In the same origin case, workerStart would be the beginning of the first request to the same origin and fetchStart would be the beginning of the last request.
… In that case you’d also have redirect start and end and your workerstart time would also include redirects.
… So one downside here is that we won’t be able to separate SW start time from redirects.
… So if your redirect takes a second, that would be included in the time
Marcel: What would be the box, just workerStart?
Nic: workerStart at the beginning.
Timo: If we don’t expose redirects, but do expose workerStart that may be confusing. I don’t know if we regret not exposing same-origin redirects, but we should make it consistent. Not expose one without the other.
Nic: So for consistency, it should be the worker starting time of only the last request? So if you have a series of redirects, you won’t be able to measure the SW startup time. But in the other way, you wouldn’t either because the redirect would be included in the reported time.
Yoav: But in that scenario the SW will delay the request. Unless we’re talking about navigation preload, but otherwise the SW is in the critical path
Nic: It’s in the critical path, but we can’t measure what it is if there are redirects. Even if it’s just same-origin redirects, workerStart will be at the beginning of that phase.
… We don’t really have workerEnd here
Nicolás: Isn’t fetch start like workerEnd for this?
Nic: For the second request, but for the first one workerStart will be missing as it’s in between the redirects. Unless we have an actual workerEnd.
Yoav: During the hackathon we talked about maybe splitting the redirect phase into cross-origin and a full chain of same-origin phases. But that won’t help in case there’s a same-origin request as part of the cross-origin redirect chain.
… But maybe that’s fine. Not sure how popular are those flows compared to same origin redirects.
Nic: Seems like a common pattern for user-click reporting.
... Now that we talked about it, it’d be sad if workerStart would be useless in the case of redirects
Timo: Why is there no workerEnd?
Nic: Not sure. The work was done 4-5 years ago
Timo: If there aren’t security implications it could represent the first fetch in the last same-origin chain.
Timo: Question in chat to talk more about gaps mentioned earlier - there are all these start and end ranges, but the steps don’t add up to the total, there’s typically a 10ms gap. So we attribute some steps to those gaps between the boxes.
Nic: mPulse deals with this as well. We commonly call them different names: HOL blocking, etc
Yoav: AI to everyone to find current users and try to make them part of this conversation
Nicolás: Also need to make sure we’re not breaking existing users too much and all UAs are ok with that
Benjamin: maybe inching towards “worker startup time” instead of workerStart
Nic: That’s also something we’re trying to expose in mPulse as well, as we’ve seen it impacting our customers
Nicolás: There’s a RT issue #119 about workers as well
Nic: I’ll keep it in mind

Hard to feature-detect observe() parameters

Nicolás: We've added some parameters to .observe() of PO, using "buffered" flag or "type"
... It's not easy to feature detect whether the buffered flag is supported in a UA
... You could do a try/catch
... Bad ergonomics, and requires exception handling to detect
... Some developers not comfortable doing this in production code
… If anyone used it, any feedback?
Nic: A current UA would throw a UnsupportedArgument (or whatever) exception?
Nicolás: Yes, I would expect it would throw, but haven’t verified
Yoav: If we were to add some feature detection, we'd do some sort of "supported" pattern for the supported parameters
Nicolás: Yeah. Right now parameters that don’t mean anything are ignored. It’s throwing because it doesn’t have entryTypes.
Peter: The try/catch way is also one of the ways proposed for feature detection, but there's existing discussion for passive event listeners

Yoav: That seems super relevant. Seems like someone needs to read through the issue.
Timo: Anecdotally, we had a try/catch on the buffered version and didn't have it on the non-buffered version, nested try/catch made the ergonomics bad
Yoav: Unfortunately even if newer browsers add the new supported way of detecting support, the older browser would still need multiple levels of detection since they wouldn't support that way of detecting
Timo: Also, by the time browser support for that would happen, we would probably drop support for the oldest browsers that require this nesting.
Nicolás: Will take a look at the open issue and comment on our needs

performance-timeline #168 - Reset the observers

Yoav: We don't have a good way to measure SPAs other than UserTiming, so asking for a way to reset the observers
Michal: Layout Instability already works well for SPAs
... For things like InputDelay there's room to make it work
... For FP, FCP, after page load (blank canvas) is much different than from an arbitrary point, but also the metrics that come back won't be equivalent
… There are security and performance reasons as to why this is hard, but also, the metrics won’t be equivalent.
... For LCP should it be just the portion of the DOM that changed, or related to the whole page?
… So, IMO, it doesn’t make sense to just reset metrics
... FirstInputDelay and EventTiming, there's a minimum threshold for input events, whereas for SPA events if you reset you may want a 0 threshold to make sure it gets reported
Yoav: Right. Because FID is always reported but only slow events are. There’s also a security question of whether we can enable that.
Yoav: We may want to fold this into a broader SPA reporting issue
... Maybe we can have a joint label to track them all
... AI to add label

hasDroppedEntry

Nicolás: Was doing a TAG review for hasDroppedEntry
... Parameter for callback of PO
... Tells you whether the observer has been observing an entrytype where at least one entry was dropped from the buffer. Indicates that you lost some data on the observer
... TAG didn't like the name "hasDroppedEntry", proposed "hasBufferOverflow"
Yoav: Only concern is it's not the bestest English, hasBufferOverflown?
Nicolás: Unrelated to naming - would there be compat issues for a callback that now has an extra parameter?
Yoav: It’s theoretically observable through the args, but it will only break if someone explicitly checked that this argument is undefined before. So, I wouldn't expect breakage
Noam: Back to naming: concern around association with security term "buffer overflow"
Timo: "isFull"
Nicolás: But we mean that it was full and then at least another entry was added.
Timo: "wasFullAndThenSome" :)
Nicolás: Will follow up with TAG for naming