1 of 42

Discussion 10

1

CS 168, Summer 2025 @ UC Berkeley

Slides credit: Sylvia Ratnasamy, Rob Shakir, Peyrin Kao, Iuniana Oprescu

Datacenters 💾

2 of 42

Logistics

  • Project 3A: Transport
    • Deadline: Tuesday, July 29th (you can still file an extension!)
  • Project 3B: Transport
    • Deadline: Tuesday, August 5th
  • Final exam is on Wednesday, August 13th, 3-6PM
    • The final accommodations form is open. If you need any exam accommodations (DSP, online, left handed desk, etc.), please fill this form out by Wednesday, August 9th.
    • For the midterm, a lot of requests came in for changes after the deadline, and while we were lenient, we may not be for the final, so please make sure to fill this out!

3 of 42

Data Centers

  • Fat Tree
  • Clos Topology

4 of 42

Datacenters

  • How does one design datacenter networks?
  • Consider new assumptions (i.e., different from what we learned in Internet).
    • Single administrative control over the network topology, traffic, and to some degree end hosts
    • Much more homogenous
    • Strong emphasis on performance
    • Few backwards compatibility requirements, clean-slate solutions welcome!

  • We will focus on Data Center topology in the rest of the discussion

5 of 42

Bisection Bandwidth

  • We want a network with high bisection bandwidth:
    • Pick the number of links we must cut in order to partition a network into two equal halves
    • Bisection bandwidth is the sum of those bandwidths.

6 of 42

Bisection Bandwidth

  • Full bisection bandwidth: Nodes in one partition can communicate simultaneously with nodes in the other partition at full rate.
    • Given N nodes, each with access link capacity R, bisection bandwidth = N/2 x R

  • Oversubscription, informally, how far from the full bisection bandwidth we are
    • Formally: ratio of worst-case achievable bandwidth to full bisection bandwidth.

7 of 42

Bisection Bandwidth

8 of 42

Big switch abstraction

  • We want an abstraction of a big switch
  • Naive solutions:
    • Server-to-server full-mesh
    • A physical big switch?
  • Either impractical or too costly ($$$); 10k hosts * 10k hosts = ~100M links
  • Can we do better?

9 of 42

Design 1: Fat tree (scale-up)

  • Borrowed straight from the High Performance Computing (i.e., supercomputers) community
  • Use of big, non-commodity switches
  • Problem? Scales badly in terms of cost
    • ..only a few switch vendors can make big switches
  • Scales badly in terms of fault-tolerance

small switch

big switch

10 of 42

Design 2: Clos (scale-out)

  • Replace nodes in the fat tree with groups of cheap commodity switches
    • cheaper
    • high redundancy (bisection width)
  • Allows oversubscription ratio of 1

Pod 1

Pod 2

Pod 3

Pod 4

Edge layer

Aggregation layer

Core layer

k = 4

11 of 42

A Closer look at Clos

  • The concept of redundancy and the clos topology itself have many variations!
  • We focus on the 3-tiered clos topology
  • Homogenous switches (k ports) and links
  • Each non-core switch has k/2 ports pointing north and k/2 pointing south

Pod 1

Pod 2

Pod 3

Pod 4

Edge layer

Aggregation layer

Core layer

k = 4

12 of 42

Fat Tree Clos Network – Bisection Bandwidth

Achieves full bisection bandwidth.

  • Each host in left half has a dedicated path to each host in right half.

Pod 1

Pod 2

Pod 3

Pod 4

Edge layer

Aggregation layer

Core layer

k = 4

13 of 42

Fat Tree Clos Network

A k-ary fat tree has k pods.

  • Each pod has k switches.
    • k/2 switches in the upper aggregation layer.
    • k/2 switches in the lower edge layer.

Pod 1

Pod 2

Pod 3

Pod 4

Edge layer

Aggregation layer

Core layer

k = 4

14 of 42

Caveats about clos

  • Oversubscription ratio of 1 only with optimal load balancing
  • Non-trivial to incrementally build and/or expand the network
    • e.g., port count k is fixed

15 of 42

ECMP

16 of 42

ECMP

  • ECMP - Equal Cost Multi-Path
    • Goal: use multiple paths in network topology that are equal cost
    • Idea: Load-balance packets across different forwarding paths
  • ECMP Hash Function how to load balance packets:
    • f(src_ip, dst_ip, protocol, src_port, dst_port)
    • “Per-flow” load balancing
      • packets belonging to the same "flow"are always routed along the same path across multiple available links

17 of 42

Overlay/Underlay

18 of 42

(Over/Under)lay

  • Many hosts spin up (boot up) Virtual machines frequently
  • Addressing could become a mess!
    • And won’t scale!
  • Thus, we build overlay and underlay networks

19 of 42

How? Encapsulation

  • Encapsulation: put another header on the packet

  • Decapsulation: remove extra headers that were added for encapsulation

Payload

TCP Header

IP (Overlay) Header

IP (Underlay) Header

Payload

TCP Header

IP Header

Original design.

The new design.

Payload

TCP Header

IP Header

IP (Underlay) Header

Payload

TCP Header

IP (Overlay) Header

20 of 42

Encapsulation and Decapsulation

Let's see how to use the new layer to connect the overlay and underlay networks.

Underlay

Overlay

Server 1 1.1.1.1

R1

R2

R3

R4

Virtual Switch

V2

V1

V3

192.0.2.1

192.168.1.2

10.16.1.2

Server 2 2.2.2.2

Virtual Switch

V5

V4

V6

10.7.7.7

10.8.8.8

192.0.5.7

21 of 42

Encapsulation and Decapsulation

Our goal: VM1 wants to talk to VM6.

Underlay

Overlay

Server 1 1.1.1.1

R1

R2

R3

R4

Virtual Switch

V2

V3

192.0.2.1

192.168.1.2

10.16.1.2

Server 2 2.2.2.2

Virtual Switch

10.7.7.7

10.8.8.8

192.0.5.7

V1

V5

V4

V6

Payload

22 of 42

Encapsulation and Decapsulation (Step 1/5)

VM1 adds an overlay header with the destination's virtual address.

Then, VM1 passes the packet to the virtual switch.

Underlay

Overlay

Server 1 1.1.1.1

R1

R2

R3

R4

Virtual Switch

V2

V3

192.0.2.1

192.168.1.2

10.16.1.2

Server 2 2.2.2.2

Virtual Switch

To: 192.0.5.7

Payload

10.7.7.7

10.8.8.8

192.0.5.7

V5

V4

V6

V1

23 of 42

Encapsulation and Decapsulation (Step 2/5)

The virtual switch reads the virtual address and looks up the matching physical address. Then, it adds (encapsulates) a new header with the physical address.

Then, the virtual switch forwards the packet to routers in the datacenter.

Underlay

Overlay

Server 1 1.1.1.1

R1

R2

R3

R4

V2

V1

V3

192.0.2.1

192.168.1.2

10.16.1.2

Server 2 2.2.2.2

Virtual Switch

To: 192.0.5.7

Payload

10.7.7.7

10.8.8.8

192.0.5.7

V5

V4

V6

To: 2.2.2.2

Virtual Switch

Encapsulate

Haven't discussed how to look up yet. For now, it's magic.

24 of 42

Encapsulation and Decapsulation (Step 3/5)

The routers in the datacenter forward the packet according to its physical (underlay) address. No need to think about virtual addresses!

Underlay

Overlay

Server 1 1.1.1.1

R1

R2

R3

R4

Virtual Switch

V2

V1

V3

192.0.2.1

192.168.1.2

10.16.1.2

Server 2 2.2.2.2

Virtual Switch

To: 192.0.5.7

Payload

10.7.7.7

10.8.8.8

192.0.5.7

V5

V4

V6

To: 2.2.2.2

25 of 42

Encapsulation and Decapsulation (Step 4/5)

Eventually, R4 receives the packet and reads its physical (underlay) destination address, 2.2.2.2.

R4 is connected to physical server 2.2.2.2, so it forwards the packet to the server.

Underlay

Overlay

Server 1 1.1.1.1

R1

R2

R3

R4

Virtual Switch

V2

V1

V3

192.0.2.1

192.168.1.2

10.16.1.2

Server 2 2.2.2.2

Virtual Switch

10.7.7.7

10.8.8.8

192.0.5.7

V5

V4

V6

To: 192.0.5.7

Payload

To: 2.2.2.2

26 of 42

Encapsulation and Decapsulation (Step 5/5)

The virtual switch at 2.2.2.2 sees a packet destined for itself.

The virtual switch removes (decapsulates) the underlay header, revealing the virtual address of the destination.

Then, the virtual switch sends the packet to the VM with virtual address 192.0.5.7.

Underlay

Overlay

Server 1 1.1.1.1

R1

R2

R3

R4

Virtual Switch

V2

V1

V3

192.0.2.1

192.168.1.2

10.16.1.2

Server 2 2.2.2.2

10.7.7.7

10.8.8.8

192.0.5.7

V5

V4

V6

To: 192.0.5.7

Payload

To: 2.2.2.2

Virtual Switch

Decapsulate

27 of 42

Encapsulation and Decapsulation

Success – our packet reached VM6!

Underlay

Overlay

Server 1 1.1.1.1

R1

R2

R3

R4

Virtual Switch

V2

V1

V3

192.0.2.1

192.168.1.2

10.16.1.2

Server 2 2.2.2.2

10.7.7.7

10.8.8.8

192.0.5.7

V5

V4

V6

Virtual Switch

Payload

To: 192.0.5.7

28 of 42

Encapsulation and Decapsulation

Why did this work?

  • The overlay network (VM1 and VM6) only thought about virtual addresses.
  • The underlay network (R1–R4) only thought about physical addresses.
  • The virtual switches acted as a bridge between the two layers.
  • Using encapsulation allows us to hide the “inner” overlay packet from the underlay

Underlay

Overlay

Server 1 1.1.1.1

R1

R2

R3

R4

Virtual Switch

V2

V1

V3

192.0.2.1

192.168.1.2

10.16.1.2

Server 2 2.2.2.2

Virtual Switch

V5

V4

V6

10.7.7.7

10.8.8.8

192.0.5.7

29 of 42

Example of Encapsulation

source: lecture 20, slide 55

30 of 42

Worksheet

  • True/False
  • STP
  • Clos-based Topology & ECMP
  • Encapsulation

31 of 42

Question 1: True/False

32 of 42

Worksheet

  • True/False
  • STP
  • Clos-based Topology & ECMP
  • Encapsulation

33 of 42

Question 2: STP

34 of 42

Question 2: STP

35 of 42

Question 2: STP

36 of 42

Question 2: STP

37 of 42

Worksheet

  • True/False
  • STP
  • Clos-based Topology & ECMP
  • Encapsulation

38 of 42

Question 3: Clos-based Topology & ECMP

39 of 42

Worksheet

  • True/False
  • STP
  • Clos-based Topology & ECMP
  • Encapsulation

40 of 42

Question 4: Encapsulation

41 of 42

Question 4: Encapsulation

42 of 42

Questions?

Feedback Form: https://tinyurl.com/cs168-su25-disc-feedback