1 of 49

Caching Doesn’t Improve Mobile Web Performance*

Jamshed Vesuna Colin Scott

Michael Buettner Michael Piatek

Arvind Krishnamurthy* Scott Shenker†‡

UC Berkeley Google *University of Washington ICSI

Special thanks to our shepherd Dan Tsafrir

*Much

2 of 49

Flywheel NSDI’15 Results

2

Increasing the cache hit ratio of their proxy from 22% to 32% resulted in only

1-2% reduction in median mobile page load time

3 of 49

3

4 of 49

Goal:

Understand the effects of caching on mobile web performance

4

5 of 49

Outline

  • Motivation
  • Background
  • Model (Estimating Page Load Time)
  • Methodology for empirical results
  • Corroborating model with empirical results
  • Conclusion

5

6 of 49

Background - Loading a Web Page

6

7 of 49

Background - Critical Path

7

Critical Path: the longest chain of dependent browser tasks

Fetch Delay = Network Delay

Render Delay = Computational Delay

8 of 49

Background - Page Load Time (PLT)

8

9 of 49

Outline

  • Motivation
  • Background
  • Model (Estimating Page Load Time)
  • Methodology for empirical results
  • Corroborating model with empirical results
  • Conclusion

9

10 of 49

Performance Model - Estimating PLT

C - computational delays

N - network delays

K - fraction of objects on the critical path that are cacheable

X - cache hit ratio (out of all objects)

f() - overlap of C and N on the critical path

10

EPLT [X] = C+N·(1−K·X)− f(X)

11 of 49

Performance Model - Building an Intuition

  • Cold cache (X = 0):
    • Original Page Load Time = C + N
  • Perfect cache for a “perfectly cacheable page”
    • X = 1, K = 1
    • Strict upper bound on improved page load time:
      • EPLT [1] = C

11

EPLT [X] = C+N·(1−K·X)

12 of 49

Performance Model - Fitting K

In practice, K ~ 0.2 = ⅕*

EPLT [max] ≤ C + ⅘N

12

*Demystifying Page Load Performance with WProf. NSDI ’13

13 of 49

Prediction: Upper Bound on Caching Benefits

C:N ~ ⅔ for mobile devices

PLTo = EPLT [0] C+N = 5/2 C

EPLT [max] ≤ 11/5 C

Reduction in PLT: (EPLT [X] - PLTo) / PLTo

≤ 3/25 (12% with a perfect cache!)

13

14 of 49

Prediction: Desktop Benefits from Caching

C:N ~ ⅙ for fast desktop devices

PLTo = EPLT [0] C+N = 7 C

EPLT [max] ≤ 21/5 C

Reduction in PLT: (EPLT [X] - PLTo) / PLTo

≤ 2/5 (40% with a perfect cache!)

14

15 of 49

Explanation: C is Small for Desktop

C:N ~ ⅕ for 2GHz CPU*

*Demystifying Page Load Performance with WProf. NSDI ’13

16 of 49

Explanation: C is Small for Desktop

C:N ~ ⅕ for 2GHz CPU*

*Demystifying Page Load Performance with WProf. NSDI ’13

17 of 49

Explanation: C is Small for Desktop

C:N ~ ⅕ for 2GHz CPU*

*Demystifying Page Load Performance with WProf. NSDI ’13

18 of 49

Explanation: C is Small for Desktop

C:N ~ ⅕ for 2GHz CPU*

*Demystifying Page Load Performance with WProf. NSDI ’13

19 of 49

Explanation: C is Larger for Mobile

19

C:N ~ ⅔ for 1GHz CPU

20 of 49

Outline

  • Motivation
  • Background
  • Model (Estimating Page Load Time)
  • Methodology for empirical results
  • Corroborating model with empirical results
  • Conclusion

20

21 of 49

Measurement Methodology

21

22 of 49

Measurement Methodology

22

23 of 49

Measurement Methodology

  1. Record the original page

23

24 of 49

Measurement Methodology

  • Record the original page

24

25 of 49

Measurement Methodology

  • Record the original page
  • Then, replay with:
    1. With a “perfect cache”
    2. Or a “partial cache”

25

26 of 49

Measurement Methodology

  • Record the original page
  • Then, replay with:
    • With a “perfect cache”
    • Or a “partial cache”
  • Repeat

26

27 of 49

Outline

  • Motivation
  • Background
  • Model (Estimating Page Load Time)
  • Methodology for empirical results
  • Corroborating model with empirical results
  • Conclusion

27

28 of 49

Workload Characteristics

28

29 of 49

Workload Characteristics

29

30 of 49

Workload Characteristics

30

31 of 49

Increasing Cache Hits - Flywheel Result

31

Increased cache hit ratio from 20% to 30%

→ 1-2% reduction in page load time

32 of 49

Desktop vs Mobile, Perfect Cache

32

Reduction Defined As:

(Original PLT - PLT with a perfect cache) / (Original PLT)

33 of 49

Desktop vs Mobile, Perfect Cache

33

Median reduction in PLT for 3.2 GHz desktop is 34%

34 of 49

Desktop vs Mobile, Perfect Cache

34

Median reduction in PLT for mobile is 13%

35 of 49

Isolating the Bottleneck Resource

35

Constrained CPU similar to Mobile

36 of 49

Isolating the Bottleneck Resource

36

Constrained RAM similar to Desktop

37 of 49

Isolating the Bottleneck Resource

37

CPU is the key difference, not RAM

38 of 49

Slower CPUs Show Reduced Improvements

38

As CPU is throttled, caching has a reduced impact on PLT

39 of 49

Slower CPUs Show Reduced Improvements

39

As CPU is throttled, caching has a reduced impact on PLT

40 of 49

Caching Benefits are Limited by Slow CPUs

  • We know: slower CPUs increase computational delays (C)
  • For desktop, network delay (N) dominates (C)
  • For mobile*, network delay (N) is comparable to (C) (3:2)

40

*Assumption: “All else being equal” (including b/w)

  • Caching only reduces (N)

→ Mobile devices benefit less from web caching

41 of 49

Implications

  • Content providers:
    • Stop paying for CDNs* [for mobile users]

41

*If you only care about end user latency

  • Analyze what’s on the critical path
    • Cache critical path items
    • Make use of SPDY or HTTP/2 prioritization levels

42 of 49

Conclusion

  • Caching doesn’t decrease mobile PLT much
    • Items on the critical path are often not cacheable*
    • CPU is the key bottleneck resource on mobile
  • Key contribution: predictive performance model

42

*Demystifying Page Load Performance with WProf. NSDI ’13

43 of 49

Backup Slides

43

44 of 49

Data Validation

  • Compared status codes of replays
  • Checked PLT variance
  • Enforced PLT without caching ≥ PLT with caching
  • Removed web pages with high non-determinism (9%)

Sanity Checks: https://github.com/colin-scott/page_load_time/tree/master/telemetry/sanity_checks

44

45 of 49

Bandwidth vs Latency

  • 85% of web pages are latency bound*
  • Most of web pages are relatively small
  • For large web pages, our model doesn’t necessarily hold

45

* Flywheel data

46 of 49

Lots of Related Work

  • Many papers have looked at this
    • Demystifying Page Load Performance with WProf (Wang et al.)
    • An In-depth study of Mobile Browser Performance (Nejati et al.)
    • What slows you down? Your network or your device? (Steiner et al.)
    • Web Caching on Smartphones: Ideal vs. Reality (Qian et al.)

46

47 of 49

Known Limitations - PLT

  • Other metrics:
    • SpeedIndex
    • Above-the-fold time
  • Better at capturing the user’s perspective
  • Difficult to measure consistently

47

48 of 49

Device Specs

  • Each web page was originally fetched over UC Berkeley’s LAN, which approximates 250 Mbps down and 230 Mbps up. Our mobile device is a Galaxy Tab 4 with a 1.2 GHz quad-core processor and 1.5 GB on board RAM running Android 4.4, KitKat. Desktop results were performed in a virtual machine with a 3.2 GHz quad-core processor and 6 GB RAM.

48

49 of 49

Known Limitations - WPR

  • The PLT measurements taken by WPR are not necessarily consistent with PLTs observed on live web pages, nor are they necessarily consistent across multiple runs of WPR.
    • First, although WPR attempts to mitigate non-determinism in JavaScript execution (by injecting a script into each web page that interposes on non-deterministic calls such as getTime), JavaScript may nonetheless exhibit non-determinism across different loads.
    • Second, the mechanism WPR uses to emulate the original RTTs observed during record mode (sleeping a fixed number of milliseconds) may not perfectly match the behavior of the original page load. We try to mitigate these artifacts by loading each web page four times and taking the minimum PLT.

49