1/43

WebRTC

HTTP/2 is here, let’s optimize!

or, why (some) yesterday's best-practices are today's HTTP/2 anti-patterns.

+Ilya Grigorik

@igrigorik

RFC 7540

2/43

3/43

4/43

“9% of all Firefox (M36) HTTP transactions are happening over HTTP/2. There are actually more HTTP/2 connections made than SPDY ones. This is well exercised technology.”

Feb 18, 2015 - Patrick McManus, Mozilla

New TLS + NPN/ALPN connections in Chrome:

  • ~27% negotiate HTTP/1
  • ~28% negotiate SPDY/3.1
  • ~45% negotiate HTTP/2

May 26, 2015 - Chrome telemetry

5/43

HTTP/2 is here.

HTTP/2 is well tested.

HTTP/2 is replacing SPDY.

See RFC7540 (HTTP/2) and RFC7541 (HPACK). HTTP/2 has already surpassed SPDY in adoption, and Chrome will deprecate SPDY (and NPN) in early 2016.

6/43

HTTP/2 in ~5 slides...

the what, why, and how behind the new protocol.

7/43

  • Improve end-user perceived latency
  • Address the "head of line blocking"
  • Not require multiple connections
  • Retain the semantics of HTTP/1.1

"HTTP/2 is a protocol designed for low-latency transport of content �over the World Wide Web"

8/43

Latency vs Bandwidth impact on Page Load Time

To speed up the Internet at large, we should look for more ways to bring down RTT. What if we could reduce cross-atlantic RTTs from 150 ms to 100 ms? This would have a larger effect on the speed of the internet than increasing a user’s bandwidth from 3.9 Mbps to 10 Mbps or even 1 Gbps.” - Mike Belshe

Single digit % perf improvement after�5 Mbps

Linear improvement in page load time!

9/43

HTTP/2 in one slide…

  • One TCP connection
  • Request → Stream
    • Streams are multiplexed
    • Streams are prioritized
  • Binary framing layer
    • Prioritization
    • Flow control
    • Server push
  • Header compression (HPACK)

10/43

HTTP/2 binary framing 101

  • HTTP messages are decomposed into one or more frames
    • HEADERS for meta-data
    • DATA for payload
    • RST_STREAM to cancel
    • ...

  • Each frame has a common header
    • 9-byte, length prefixed
    • Easy and efficient to parse

11/43

Basic data flow in HTTP 2.0...

Streams are multiplexed because frames can be interleaved

  • All frames (e.g. HEADERS, DATA, etc) are sent over single TCP connection
  • Frame delivery is prioritized based on stream dependencies and weights
  • DATA frames are subject to per-stream and connection flow control

12/43

HPACK header compression

  • Literal values are (optionally) encoded with a static Huffman code
  • Previously sent values are (optionally) indexed
    • e.g. “2” in above example expands to “method: GET”

13/43

For a deep(er) dive on HTTP/2 protocol, grab the free book at the O’Reilly booth, or…

Read it online (free):

hpbn.co/http2

14/43

Optimizing (web) application delivery

let’s take a quick tour of our delivery pipeline...

15/43

Application

HTTP

TCP

Link layer

(Ethernet, WiFi, LTE…)

UDP

RRC and radio delays, energy consumption, ...

All things DNS (and QUIC :))

Handshakes, goodput, packet loss, ...

Resource fetch, execution and processing, ….

Parallelism, prioritization, protocol overhead, ...

If/when lower layers fail, we’re forced to “optimize” at the application layer...

16/43

Application

HTTP

TCP

Link layer

UDP

Reduce DNS lookups

Unresolved names block requests

✓ HTTP/1.x

✓ HTTP/2

Application

HTTP

TCP

Link layer

UDP

Reuse TCP connections

Connection are expensive� - handshake latency, resource overhead, ...

✓ HTTP/1.x

✓ HTTP/2

Single connection!

17/43

Application

HTTP

TCP

Link layer

UDP

Use a Content Delivery Network

Page rendering is latency-bound (most of the time)

- lower roundtrip times are critical to optimize asset delivery

✓ HTTP/1.x

✓ HTTP/2

Application

HTTP

TCP

Link layer

UDP

Minimize number of HTTP redirects

Each redirect restarts the fetch process� - cross-origin redirects are worst case: DNS, TCP, new HTTP request

✓ HTTP/1.x

✓ HTTP/2

18/43

Application

HTTP

TCP

Link layer

UDP

Eliminate unnecessary request bytes

Unnecessary metadata (e.g. headers) add up quickly

- 100+ requests, with a few KB each of headers… hundreds of KB’s!

✓ HTTP/1.x

✓ HTTP/2

Application

HTTP

TCP

Link layer

UDP

Compress assets during transfer

Bytes are slow and expensive to transfer...� - GZIP offers 40-80% savings on most assets - easy win.

✓ HTTP/1.x

✓ HTTP/2

HPACK helps...

19/43

Application

HTTP

TCP

Link layer

UDP

Cache resources on the client

Redundant data transfers are… redundant!

- Cache-Control and ETag’s on each resource is a must.

✓ HTTP/1.x

✓ HTTP/2

Application

HTTP

TCP

Link layer

UDP

Eliminate unnecessary resources

Fetch what you need, bytes are expensive� - Aggressive prefetching is expensive both on client and server.

✓ HTTP/1.x

✓ HTTP/2

20/43

Evergreen performance best-practices

  • Reduce DNS lookups
  • Reuse TCP connections
  • Use a Content Delivery Network
  • Minimize number of HTTP redirects
  • Eliminate unnecessary request bytes
  • Compress assets during transfer
  • Cache resources on the client
  • Eliminate unnecessary resources

21/43

Application

HTTP/1.x

TCP

Link layer

UDP

High protocol overhead

  • Header meta-data is uncompressed
  • ~800 bytes of meta-data per request, plus cookies

Head-of-line blocking

  • Request queuing and delayed request dispatch on the client
  • Strict response ordering on the server

Limited parallelism

  • Parallelism is limited by number of connections
  • In practice, ~6 connections per origin

22/43

Parallelism is limited by number of connections

~6 parallel downloads per origin

  • Each connection incurs full TCP handshake
  • Each connection incurs TLS handshake overhead (best case, resumed)
  • Each connection occupies server/proxy resources (memory, CPU, etc)
  • Each connection competes with others (broken congestion control)

23/43

No problem… domain shard all the things!

Duplicate (spurious) data packets lead to poor connection “goodput”

What’s the optimal number of shards?

Trick question, the answer depends on device + network + network weather + page architecture.

Most sites abuse sharding, and hurt themselves… causing congestion, retransmissions, etc.

Congestion control is a bigger problem than you might think. Especially in emerging markets where many users are bandwidth+latency limited.

24/43

“One great metric around that which I enjoy is the fraction of connections created that carry just a single HTTP transaction (and thus make that transaction bear all the overhead). For HTTP/1 74% of our active connections carry just a single transaction - persistent connections just aren't as helpful as we all want. But in HTTP/2 that number plummets to 25%. That's a huge win for overhead reduction.”

Patrick McManus, Mozilla.

25/43

Report card: domain sharding

Introduced as a workaround for lack of multiplexing in HTTP/1. UA’s open ~6 connections per origin (6 parallel downloads), and sharding allows us to raise this number to… any number.

  • Enables (higher) parallelism for HTTP/1.x
  • No “best” value for number of shards: wasted retransmissions & congestion
  • Each connection has a resource cost: memory, CPU, etc
  • Each connection competes with others for bandwidth: poor TCP performance
  • Complicates our code and applications
  • Breaks resource prioritization and flow control for HTTP/2
  • Reduces HPACK compression benefits for HTTP/2

HTTP/1.x: domain sharding is abused, consider limiting use to two shards.

HTTP/2: unnecessary. Negative impact on performance. Eliminate.

26/43

Removing domain sharding for HTTP/2

HTTP/2 can coalesce connections on your behalf, if... �

  • TLS certificate is valid for both hosts
  • Hosts resolve to the same IP
  • HTTP/1.1 → opens new connection to each origin
  • HTTP/2 → reuses the same connection for altName origins

Using this technique you can get the best of both worlds with minimal work.

$> openssl s_client -connect google.com:443 |

openssl x509 -noout -text |

grep DNS

DNS:*.google.com, DNS:*.android.com, DNS:*.appengine.google.com, ...

27/43

“Reduce requests” → concatenate all the things!

Head of line blocking is expensive, instead of fetching N small assets, let’s fetch fewer but bigger assets.. with same content!

  • Parallelism workaround for HTTP/1.x
  • Improved compression due to shared context

  • Delayed processing and execution
    • e.g. must wait for entire (CSS, JS) response to arrive
  • Expensive cache invalidations
    • single byte update forces full fetch
    • redundant data transfers on update

28/43

Yes, we’ve always had caching… But, we were never able to optimize for “churn” because small requests were too expensive. This is no longer the case.

  • How expensive are your cache invalidations?
  • Can you isolate high-velocity code from “stable” code?
  • Can you assign different expiry times to these resources?

* churn: ratio of bytes (in cache vs new) we have to fetch when pushing an update.

29/43

Report card: concatenated assets

Introduced as a workaround for head-of-line blocking in HTTP/1. Concatenation allows us to fetch “multiple files” within one request.

  • Improves HTTP/1.x performance
  • May deliver better resource compression
  • Complicates our code and applications - e.g. extra build steps
  • Breaks granular caching, updates, and revalidation
  • Delays resource processing and execution
  • Forces expensive cache invalidations

HTTP/2: avoid*. Ship small granular resources and optimize caching policies.

HTTP/1.x: use carefully and consider and optimize for invalidation costs (churn).

* Significant wins in compression are the only case where it might be useful.

30/43

“Reduce Requests” → inline all the things!

Head of line blocking is expensive, instead of fetching granular resources, just embed them inside others… to avoid requests.

  • Parallelism workaround for HTTP/1.x
  • Eliminates full request roundtrip

  • Inlined resource can’t be cached independently
    • must duplicate bytes if resource is reused
    • invalidated if single byte in parent changes

  • Breaks resource multiplexing and prioritization in HTTP/2
    • Inlined bytes are shipped with parent’s priority
    • Inlined bytes can’t be declined by the client

31/43

Inlining: because you’ll also need…

Server: “You asked for /product/123, but you’ll need app.js, product-photo-1.jpg, as well… I promise to deliver these to you. That is, unless you decline or cancel.”

GET /product/123

/product/123

/app.js

/product-photo-1.jpg

Inlining is server push. Except, server push has nicer properties:

  • Delivers granular resources, which can be cached individually
  • Delivers granular resources, which can be mux’ed and prioritized correctly
  • Allows the client to opt-in / opt-out and control how and where it is used
    • Server push is subject to flow control - e.g. “you can only push 5KB”
    • Server push is optional and can be disabled by the client

32/43

Per-stream flow control + server push

  • Client: “I want first 20KB of photo.jpg”
  • Server: “Ok, 20KB… pausing stream until you tell me to send more.”
  • Client: “Send me the rest now.”

GET /product-photo.jpg

/product-photo.jpg

/product-photo.jpg

WINDOW_UPDATE

  • Flow control allows the client to pause stream delivery, and resume it later
  • Flow control is a “credit-based” scheme
    • Sending DATA frames decrements the window
    • WINDOW_UPDATE frames update the window

I want image geometry and preview, and I’ll fetch the rest later...

33/43

Report card: resource inlining

  • Removes full request roundtrip - i.e. “you’ll need this”
  • Parallelism workaround for HTTP/1.x
  • Complicates our workflows, code and applications
  • Breaks granular caching, updates, and revalidation
  • Forces frequent and expensive invalidations

Introduced as a workaround for head-of-line blocking in HTTP/1. Inlining is a latency optimization that eliminates full request roundtrip, and reduces “number of requests” in HTTP/1.

HTTP/2: replace with server push. The most naive strategy is still strictly better.

HTTP/1.x: use carefully and pay close attention to caching implications.

* Significant wins in compression are the only case where it might be useful.

34/43

HTTP/2 server push instead of inlining...

Implementation and status�

  • Restricted to same-origin resources
  • Opportunity for servers to become much smarter:
    • Don’t push on every request
    • Push based on observed traffic patterns

  • UA’s iterating on implementation, e.g…
    • Auto RST_STREAM in-cache resources (works)
    • Pushing cache revalidations / invalidations? (TBD)
    • JavaScript API for processing server push? (TBD)

35/43

Jetty’s “smart push” is a great strategy...

  • Server observes incoming traffic
    • Build a dependency model based on Referer header
      • e.g. index.html → {style.css, app.js}

  • Server initiates push for learned dependencies
    • client → GET index.html
    • server → push style.css, app.js, index.html

Lots of room for experimentation + innovation!

36/43

You... need to think about prioritization

with HTTP/2 the browser is relying on the server to deliver data in an optimal way -- this is critical.

37/43

Prioritization is key to optimized rendering...

With HTTP/1.1 browsers “prioritizes” resources by holding a priority queue on the client and taking educated guesses for how to make the best use of available TCP connections… which delays request dispatch.

GET index.html

GET style.css

GET hero.jpg

GET other.jpg

High

Priority

Low

Priority

time

Delay dispatch, because we don’t want to “waste” bandwidth on images, we might have more critical resources, etc...

38/43

Prioritization is key to optimized rendering...

With HTTP/2 browsers prioritize requests based on type/context, and immediately dispatch the request as soon as the resource is discovered. The priority is communicated to the server as weights + dependencies.

GET index.html

GET style.css

GET hero.jpg

GET other.jpg

High

Priority

Low

Priority

time

Requests are initiated as soon as each resource is discovered

Responsibility is on the server to deliver the bytes in the right order!

39/43

Stream prioritization in HTTP/2...

  • Each stream can have a weight
    • [1-256] integer value�
  • Each stream can have a dependency
    • parent is another stream ID

  • E.g. style.css (“A”) should get 2/3rd’s of available resources as compared to logo.jpg (“B”)
  • E.g. product-photo-1.jpg (“D”) should be delivered before product-photo-2.jpg (“C”)

* Prioritization is an advisory hint to the server, it does not provide strict delivery semantics.

40/43

HTTP/2 prioritization in Firefox...

  • “Leaders” should get 2/3rd of resources, “Unblocked” bulk of the rest
  • “Leaders” should be prioritized ahead of “Followers” - e.g. HTML/CSS/JS ahead of images.
  • “Background” tasks should be done… in the background!

41/43

With HTTP/2 the browser relies on the server to deliver the response data in an optimal way.

It’s not just the number of bytes, or requests per second, but the order in which bytes are delivered. Test your HTTP/2 server very carefully.

  • Does it respect stream and connection flow control? (MUST)
  • Does it support dependencies and weights? (MUST)
  • Does it enable use of server push?

42/43

  • Eliminate unnecessary request bytes
  • Eliminate unnecessary resources
  • Apply domain sharding
  • Concatenate resources
  • Inline resources

HTTP/1.x

HTTP/2

Reduce DNS lookups

Reuse TCP connections

Use a Content Delivery Network

Minimize number of HTTP redirects

Eliminate unnecessary request bytes

Compress assets during transfer

Cache resources on the client

Eliminate unnecessary resources

Apply domain sharding

Revisit (max 2)

Remove

Concatenate resources

Carefully (caching)

Remove (compression)

Inline resources

Carefully (caching)

Remove, use server push

Pick your HTTP/2 server carefully!

… we need better test suites and benchmarks.

43/43

+Ilya Grigorik

@igrigorik

Learn more...

hpbn.co/http2

Slides...

bit.ly/http2-opt