1
CS 168, Fall 2024 @ UC Berkeley
Slides credit: Sylvia Ratnasamy, Rob Shakir, Peyrin Kao, Iuniana Oprescu
HTTP and CDNs
Lecture X
HTTP Specification
Lecture X, CS 168, Fall 2024
HTTP
Content Delivery Networks
Newer HTTP Versions
Brief History of HTTP
Development initiated by Tim Berners-Lee at CERN in 1989.
Driven by a need to share information between scientists.
You can still view the first website ever made.
HTTP: Basics
HTTP is a client-server protocol.
HTTP runs over TCP.
HTTP is a request-response protocol.
Client�(random port)
Server�(port 80)
TCP bytestream
HTTP Requests
The request syntax is in human-readable plaintext (can be typed by a human).
Version: What HTTP version we're using.
URL: The resource we want to interact with.
Method: What we want to do with that resource.
GET /projects/project1.html HTTP/1.1 \r\n
Method
URL
version
ends with a newline
HTTP Responses
Version: What HTTP version we're using.
Status code: A number, telling us what happened with the request.
Description: A description of the status code.
Content: The resource the user requested!
HTTP/1.1 200 OK <html>Project 1 Spec...</html>
status code
description
version
content
HTTP Responses – Status Codes
Status codes are used by the server to propagate information about the result of the request to the client.
Codes are classified into various categories, according to numeric value.
HTTP Responses – Status Codes
200s: Successful responses.
The HTTP status dog for�203 Non-Authoritative Information.
HTTP Responses – Status Codes
300s: Redirection messages.
Status codes let the client determine future behavior.
HTTP Responses – Status Codes
400s: Client error responses.
500s: Server error responses.
Status codes let the client determine future behavior.
HTTP Responses – Status Codes
Sometimes, which status code we should use is ambiguous.
Example: Request the Google homepage with HTTP/0.9.
Usually, the category of error is the most important.
HTTP Headers
Requests and responses can contain additional metadata in the form of headers.
Some headers are optional information.
Some headers are critical information.
HTTP Header Classes – Request
Headers can be classified into three types.
Request headers pass information about the client to the server.
"Referer" was misspelled in the original spec. Oops.
HTTP Header Classes – Response and Representation
Response headers are in the response, but not directly related to the content.
Representation headers are used in both requests and responses to describe how the content is represented.
HTTP Examples
Lecture X, CS 168, Fall 2024
HTTP
Content Delivery Networks
Newer HTTP Versions
HTTP in Terminal
$ telnet google.com 80
Trying 2607:f8b0:4005:802::200e...
Connected to google.com.
Escape character is '^]'.
GET / HTTP/1.1
User-Agent: robjs
HTTP is a text-based protocol, so we can connect to the Google server, type requests, and read responses, all in the terminal.
Using port 80 for HTTP.
Request: Get homepage, using HTTP version 1.1.
Adding a header to tell the server what type of client I'm using.
HTTP in Terminal
$ telnet google.com 80
Trying 2607:f8b0:4005:802::200e...
Connected to google.com.
Escape character is '^]'.
GET / HTTP/1.1
User-Agent: robjs
HTTP/1.1 200 OK
Date: Sat, 16 Mar 2024 18:33:08 GMT
Content-Type: text/html; charset=ISO-8859-1
<!doctype html><html lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description">...
Headers tell us the response date, file type (HTML), and encoding (e.g. ASCII).
Response starts with status code: 200 OK.
The page we requested. (Would look nicer in a browser.)
HTTP Examples
Note: We need a URL in all requests (even POST requests).
Note: The HTTP request can contain data.
Note: This response doesn't have any content.
Request
Response
POST /send-money HTTP/1.1
User-Agent: robjs
target=alice&amount=10
HTTP/1.1 201 Created
Location: success.html
HTTP Examples
The request has data (PUT a file on the server).
The response does not have data.
The Content-Location header says that the file we uploaded is stored at newfile.html.
Request
Response
PUT /newfile.html HTTP/1.1
User-Agent: robjs
<p>Some File</p>
HTTP/1.1 201 Created
Content-Location: newfile.html
Speeding Up HTTP
Lecture X, CS 168, Fall 2024
HTTP
Content Delivery Networks
Newer HTTP Versions
Multiple HTTP Requests
Loading a single website can require multiple HTTP requests.
Naive approach: Separate TCP connection for each request.
Client�(random port)
Server�(port 80)
GET /googlelogo.png HTTP/1.1
GET /googleicon.png HTTP/1.1
Multiple HTTP Requests – Pipelining
Smarter approach: Allow multiple requests to be pipelined over the same TCP connection.
Client�(random port)
Server�(port 80)
GET /googleicon.png HTTP/1.1
GET /googlelogo.png HTTP/1.1
HTTP Cache Types
Optimization: Cache data to avoid sending duplicate requests for the same content.
Naive approach: Every request goes to to the origin server.
There are 3 types of caches: Private, Proxy, Managed.
Client A
Client B
Client C
Origin Server
= first request
= second request
In this diagram, 3 clients each request the same resource twice.
HTTP Cache Types
Private caches are tied to a specific end client (e.g. in a user's browser).
Client A
Client B
Client C
= first request
= second request
In this diagram, 3 clients each request the same resource twice.
Origin Server
HTTP Cache Types
Proxy caches are in the network (not end host).
Proxy caches are operated by a third party (not the client or server).
Problem: Clients need to be redirected to the proxy cache somehow.
Client A
Client B
Client C
= first request
= second request
In this diagram, 3 clients each request the same resource twice.
Origin Server
Proxy Cache
HTTP Cache Types
Managed caches are in the network (just like proxy caches).
Managed caches are operated by the server (but cache server ≠ origin server).
There can be multiple proxy/managed caches in the network.
Client A
Client B
Client C
= first request
= second request
In this diagram, 3 clients each request the same resource twice.
Origin Server
Managed Cache
Managed Cache
Implementing Caching – Static vs. Dynamic Content
HTTP resources can be static or dynamic.
The server needs to tell everybody whether data can be cached, and if so, for how long.
Servers can't enforce that clients and caches actually obey the header.
HTTP/1.0 200 OK
Date: Sat, 16 Mar 2024 19:40:24 GMT
Expires: Sun, 17 Mar 2024 19:40:24 GMT
The data in this response can be cached for 24 hours!
Implementing Caching – Cache-Control Header
The Cache-Control header lets the server give more details on how to cache the data.
HTTP/1.1 200 OK
Date: Sat, 16 Mar 2024 19:40:24 GMT
Expires: Sun, 16 Mar 2024 19:40:24 GMT
Cache-Control: private, max-age=31536000
Servers could include both Expires (1.0) and Cache-Control (1.1) headers for compatibility.
1.1 client might ignore 1.0 header (and vice-versa).
Benefits of Caching
Caching benefits everybody.
Conveniently, the larger objects are static.
Content Delivery Networks
Lecture X, CS 168, Fall 2024
HTTP
Content Delivery Networks
Newer HTTP Versions
Using Caches for Content Delivery
How can application providers use caching to improve load time for users?
Content Delivery Networks
Content Delivery Networks (CDNs): Deployments of servers that can serve content (HTTP resources).
Servers can be placed "close" to end users.
Benefits of CDNs:
Deploying CDNs
Without CDNs, every request reaches the origin server.
Datacenter
Client
Origin Server
WAN
Peering
Peering
WAN
Client
Client
Client
Deploying CDNs
Application provider could deploy CDNs in its own network.
Datacenter
Client
Origin Server
WAN
Peering
Peering
WAN
Client
Client
Client
CDN
CDN
Deploying CDNs
Caching can be pushed "deeper" into the network.
Datacenter
Client
Origin Server
WAN
Peering
Peering
WAN
Client
Client
Client
CDN
CDN
Deploying CDNs
Deployment depth is limited by efficiency.
Datacenter
Client
Origin Server
WAN
Peering
Peering
WAN
Client
Client
Client
CDN
CDN
Large Global CDNs
Large application providers host their own CDNs.
CDN providers can host your service on their infrastructure for a fee.
Deployments either in their own networks, or directly into ISP networks.
CDNs in ISP Networks
ISP companies often have their own content to serve.
Often a need for both third-party caches and the ISP's own infrastructure.
Caching Server Deployments
CDN servers are highly optimized for content delivery and storage.
Flash appliance focus areas
Storage appliance focus areas
Example of Netflix server specs (you don't need to understand these).
CDN Commercial Model
CDNs are mutually beneficial!
Cooperative commercial model:
In some cases, commercial negotiations are required.
Becomes more difficult as there are more caching providers.
Commercial Challenges – Fragmentation
Cache deployment makes sense if there are small numbers of large content providers.
There's a long tail of smaller content providers.
Idea: Can we have shared caching infrastructure?
Shared infrastructure is challenging!
Bandwidth
Time
Bandwidth
Time
Directing Clients to Caches
Lecture X, CS 168, Fall 2024
HTTP
Content Delivery Networks
Newer HTTP Versions
Directing Clients to Caches
Recall our CDN model:
If there are many CDN servers, which one should the server direct the user to?
Three approaches:
Directing Clients to Caches – Anycast
Recall anycast: Advertise the same IP prefix from multiple locations.
Problem: Routing can change in the middle of a long-lived connection.
1
10
I am 1.0.0.0/24.
I am 1.0.0.0/24.
R1
C
R2
R3
S1
S2
Directing Clients to Caches – Anycast
Recall anycast: Advertise the same IP prefix from multiple locations.
Problem: Routing can change in the middle of a long-lived connection.
Link goes down.
10
???
R1
C
R2
R3
S1
S2
Directing Clients to Caches – DNS-Based Load Balancing
Recall DNS-based load balancing: Map the same domain to different IP addresses, depending on where the query came from.
Problem: Granularity.
C1
C2
C3
C4
San Francisco
Sydney
London
Tokyo
Recursive Resolver
Name Server
Directing Clients to Caches – Application-Level Mapping
Application directs the user to the cache.
Benefits:
Drawbacks:
Newer HTTP Versions
Lecture X, CS 168, Fall 2024
HTTP
Content Delivery Networks
Newer HTTP Versions
HTTPS – Secure HTTP
Lots of applications run over HTTP.
As HTTP got popular, security became a concern.
HTTPS is a secure version of the protocol.
HTTP/2.0
HTTP/2.0 was introduced in 2015. (First new revision since 1997!)
Aims to decrease latency and improve page load speed.
Widely adopted across client software (e.g. browsers) and CDNs.
HTTP/3.0
HTTP/3.0 was introduced in 2022. (Not long after the previous update!)
Semantics are the same as HTTP/2.0, but runs over QUIC instead of TCP.
Summary: HTTP and CDNs