1 of 49

Caching Doesn’t Improve Mobile Web Performance^*

Jamshed Vesuna^† Colin Scott^†

Michael Buettner^∆ Michael Piatek^∆

Arvind Krishnamurthy^* Scott Shenker^†‡

^†UC Berkeley ^∆Google ^*University of Washington ^‡ICSI

Special thanks to our shepherd Dan Tsafrir

^*Much

2 of 49

Flywheel NSDI’15 Results

2

Increasing the cache hit ratio of their proxy from 22% to 32% resulted in only

1-2% reduction in median mobile page load time

3 of 49

3

4 of 49

Goal:

Understand the effects of caching on mobile web performance

4

5 of 49

Outline

Motivation
Background
Model (Estimating Page Load Time)
Methodology for empirical results
Corroborating model with empirical results
Conclusion

5

6 of 49

Background - Loading a Web Page

6

7 of 49

Background - Critical Path

7

Critical Path: the longest chain of dependent browser tasks

Fetch Delay = Network Delay

Render Delay = Computational Delay

8 of 49

Background - Page Load Time (PLT)

8

9 of 49

Outline

Motivation
Background
Model (Estimating Page Load Time)
Methodology for empirical results
Corroborating model with empirical results
Conclusion

9

10 of 49

Performance Model - Estimating PLT

C - computational delays

N - network delays

K - fraction of objects on the critical path that are cacheable

X - cache hit ratio (out of all objects)

f() - overlap of C and N on the critical path

10

E_PLT[X] = C+N·(1−K·X)− f(X)

Here, we are estimating PLT with respect to a specific cache hit ratio

C - The sum of computational delays for all objects on the critical path for a cold (X =0) page load.

similarly,

N - the sum of network fetch delays for all objects on the critical path for a cold (X =0) page load.

K - The fraction of objects on the critical path that are cacheable.

X - is our cache hit ratio. Note that this is a strict subset of the objects in the web page that are cacheable

(1 - KX) are the objects on the critical path that are not cached

finally,

We added the f() term to account for the overlap of computational and network delays on the critical path. There are a few cases where the browser can concurrently load dependent objects, but for the average case, we can ignore this term.

This model is determining the sum of computational and network delays for objects on the critical path that are not cached.

11 of 49

Performance Model - Building an Intuition

Cold cache (X = 0):

Original Page Load Time = C + N

Perfect cache for a “perfectly cacheable page”

X = 1, K = 1
Strict upper bound on improved page load time:

E_PLT[1] = C

11

E_PLT[X] = C+N·(1−K·X)

12 of 49

Performance Model - Fitting K

In practice, K ~ 0.2 = ⅕^*

E_PLT[max] ≤ C + ⅘N

12

*Demystifying Page Load Performance with WProf. NSDI ’13

13 of 49

Prediction: Upper Bound on Caching Benefits

C:N ~ ⅔ for mobile devices

PLT^o = E_PLT[0] ≤ C+N = 5/2 C

E_PLT[max] ≤ 11/5 C

Reduction in PLT: (E_PLT[X] - PLT^o) / PLT^o

≤ 3/25 (12% with a perfect cache!)

13

14 of 49

Prediction: Desktop Benefits from Caching

C:N ~ ⅙ for fast desktop devices

PLT^o = E_PLT[0] ≤ C+N = 7 C

E_PLT[max] ≤ 21/5 C

Reduction in PLT: (E_PLT[X] - PLT^o) / PLT^o

≤ 2/5 (40% with a perfect cache!)

14

15 of 49

Explanation: C is Small for Desktop

C:N ~ ⅕ for 2GHz CPU^*

*Demystifying Page Load Performance with WProf. NSDI ’13

16 of 49

Explanation: C is Small for Desktop

C:N ~ ⅕ for 2GHz CPU^*

*Demystifying Page Load Performance with WProf. NSDI ’13

17 of 49

Explanation: C is Small for Desktop

C:N ~ ⅕ for 2GHz CPU^*

*Demystifying Page Load Performance with WProf. NSDI ’13

18 of 49

Explanation: C is Small for Desktop

C:N ~ ⅕ for 2GHz CPU^*

*Demystifying Page Load Performance with WProf. NSDI ’13

19 of 49

Explanation: C is Larger for Mobile

19

C:N ~ ⅔ for 1GHz CPU

20 of 49

Outline

Motivation
Background
Model (Estimating Page Load Time)
Methodology for empirical results
Corroborating model with empirical results
Conclusion

20

21 of 49

Measurement Methodology

21

22 of 49

Measurement Methodology

22

23 of 49

Measurement Methodology

Record the original page

23

24 of 49

Measurement Methodology

Record the original page

24

25 of 49

Measurement Methodology

Record the original page
Then, replay with:

With a “perfect cache”
Or a “partial cache”

25

26 of 49

Measurement Methodology

Record the original page
Then, replay with:

With a “perfect cache”
Or a “partial cache”

Repeat

26

27 of 49

Outline

Motivation
Background
Model (Estimating Page Load Time)
Methodology for empirical results
Corroborating model with empirical results
Conclusion

27

28 of 49

Workload Characteristics

28

29 of 49

Workload Characteristics

29

30 of 49

Workload Characteristics

30

31 of 49

Increasing Cache Hits - Flywheel Result

31

Increased cache hit ratio from 20% to 30%

→ 1-2% reduction in page load time

32 of 49

Desktop vs Mobile, Perfect Cache

32

Reduction Defined As:

(Original PLT - PLT with a perfect cache) / (Original PLT)

33 of 49

Desktop vs Mobile, Perfect Cache

33

Median reduction in PLT for 3.2 GHz desktop is 34%

34 of 49

Desktop vs Mobile, Perfect Cache

34

Median reduction in PLT for mobile is 13%

35 of 49

Isolating the Bottleneck Resource

35

Constrained CPU similar to Mobile

36 of 49

Isolating the Bottleneck Resource

36

Constrained RAM similar to Desktop

37 of 49

Isolating the Bottleneck Resource

37

CPU is the key difference, not RAM

38 of 49

Slower CPUs Show Reduced Improvements

38

As CPU is throttled, caching has a reduced impact on PLT

39 of 49

Slower CPUs Show Reduced Improvements

39

As CPU is throttled, caching has a reduced impact on PLT

40 of 49

Caching Benefits are Limited by Slow CPUs

We know: slower CPUs increase computational delays (C)
For desktop, network delay (N) dominates (C)
For mobile*, network delay (N) is comparable to (C) (3:2)

40

*Assumption: “All else being equal” (including b/w)

Caching only reduces (N)

→ Mobile devices benefit less from web caching

41 of 49

Implications

Content providers:

Stop paying for CDNs* [for mobile users]

41

*If you only care about end user latency

Analyze what’s on the critical path

Cache critical path items
Make use of SPDY or HTTP/2 prioritization levels

42 of 49

Conclusion

Caching doesn’t decrease mobile PLT much

Items on the critical path are often not cacheable*
CPU is the key bottleneck resource on mobile

Key contribution: predictive performance model

42

jamshed.vesuna@gmail.com cs@cs.berkeley.edu

This Presentation: https://goo.gl/plH4HE

PLT Analysis: https://github.com/colin-scott/page_load_time

Open Source Tools: https://github.com/JamshedVesuna/telemetry

*Demystifying Page Load Performance with WProf. NSDI ’13

43 of 49

Backup Slides

43

44 of 49

Data Validation

Compared status codes of replays
Checked PLT variance
Enforced PLT without caching ≥ PLT with caching
Removed web pages with high non-determinism (9%)

Sanity Checks: https://github.com/colin-scott/page_load_time/tree/master/telemetry/sanity_checks

44

45 of 49

Bandwidth vs Latency

85% of web pages are latency bound*
Most of web pages are relatively small
For large web pages, our model doesn’t necessarily hold

45

* Flywheel data

46 of 49

Lots of Related Work

Many papers have looked at this

Demystifying Page Load Performance with WProf (Wang et al.)
An In-depth study of Mobile Browser Performance (Nejati et al.)
What slows you down? Your network or your device? (Steiner et al.)
Web Caching on Smartphones: Ideal vs. Reality (Qian et al.)

46

47 of 49

Known Limitations - PLT

Other metrics:

SpeedIndex
Above-the-fold time

Better at capturing the user’s perspective
Difficult to measure consistently

47

48 of 49

Device Specs

Each web page was originally fetched over UC Berkeley’s LAN, which approximates 250 Mbps down and 230 Mbps up. Our mobile device is a Galaxy Tab 4 with a 1.2 GHz quad-core processor and 1.5 GB on board RAM running Android 4.4, KitKat. Desktop results were performed in a virtual machine with a 3.2 GHz quad-core processor and 6 GB RAM.

48

49 of 49

Known Limitations - WPR

The PLT measurements taken by WPR are not necessarily consistent with PLTs observed on live web pages, nor are they necessarily consistent across multiple runs of WPR.

First, although WPR attempts to mitigate non-determinism in JavaScript execution (by injecting a script into each web page that interposes on non-deterministic calls such as getTime), JavaScript may nonetheless exhibit non-determinism across different loads.
Second, the mechanism WPR uses to emulate the original RTTs observed during record mode (sleeping a fixed number of milliseconds) may not perfectly match the behavior of the original page load. We try to mitigate these artifacts by loading each web page four times and taking the minimum PLT.

49