1 of 8

Load Balancing in Fast Datacenter Networks - Two Approaches

Hassan Sajwani

2 of 8

Network Load Balancing and the Data Center

  • Modern data centers have grown to serve heavy traffic with upwards of many millions of requests a second and thousands of servers
  • Load balancers must distribute incoming network traffic in a way that maximizes speed and capacity utilization
  • Load balancers sit between servers and client requests and route traffic across the servers most capable of fulfilling them
  • The importance of this infrastructure has led to significant research around how to improve current designs

https://cloud.google.com/compute/docs/load-balancing/images/http_load_balancing_cross_region.png

3 of 8

Traditional Load Balancing

  • Two dominant load balancing architecture models prevalent in recent years
    • “Layer 4” - commonly hardware based, can be implemented from other layers below 4, the names are only loosely related to the OSI network model
    • “Layer 7” - software/application layer based
  • Traditional implementations of load balancers
    • Mainly involved dedicated hardware devices
    • Often proprietary and provided by a vendor
    • Require less computation overhead than sophisticated layer 7 approaches
    • Were popular when commodity servers did not have the power they do now and interactions between clients and servers were more complex

4 of 8

Load Balancing Today

  • Hardware based solutions had several disadvantages
    • Low serviceability and deployability
    • Employed costly specialized hardware
    • Constrained redundancy
    • Less flexible
  • Thus, modern approaches tend to take a layer 7 based approach
    • Can run on commodity hardware
    • Have access to application layer information
  • ECMP (Equal Cost Multi-Path)
    • A very popular approach to load balancing in data centers today (Google Maglev)
    • Randomly hashes a packet to a path in the network based on information in the header and is able to choose to route among several “best” paths
    • Easily implemented

https://en.wikipedia.org/wiki/Equal-cost_multi-path_routing#/media/File:802d1aqECMP.gif

5 of 8

FlowBender

  • Software based solution, no hardware changes required - very minimal changes required overall
  • Addresses ECMP’s static flow to path assignment that does not offer flexibility in the face of oversubscribed network paths
  • Dynamically reacts to network conditions only in the face of congestion or link failure (notified of congestion via ECN and failures using TCP timeouts)
  • Recomputes hashed path from another field in the header (such as TTL)
  • Uses commodity based ECMP outside of rerouting
  • Avoids excessive packet reordering - a known problem in multipath routing - by rerouting only when it is necessary
  • Avoids resorting to sub flow granularity routing that other proposals have required - leads to excessive packet reordering

6 of 8

Presto

  • Addresses the weakness of ECMP to long flows where lack of packet header entropy can lead to collisions and, by extension, congestion
  • Also a software based implementation
  • Load balances at the granularity of “flowcells” by dividing up flows at the vSwitches - used uniformly sized subflows of size 64KB
    • Chosen size ensures elephant flows will be broken up and majority of mice flows will not
  • Breaking up flows onto different paths requires handling reordering
  • Designers modified the Generic Receive Offload (GRO) flush algo in the hypervisor OS to react less aggressively with pushing up segments and use a wait and see approach
    • Related to LRO - used to aggregate packets into a buffer to push up all at once avoiding many interrupts and increasing throughput as well as lowering CPU overhead
  • GRO would then sort packets arriving out of order, or push them up to TCP if lost
  • Implemented end-to-end (centralized) multipathing to allow for greater control/scheduling

7 of 8

Critique/Reflection

  • Presto required a relatively more complex set of modifications, but seemed to track toward optimal performance in evaluations
  • Presto’s evaluation results seemed to be too good to be true
  • Presto’s evaluation of the CPU cost for the GRO modifications seemed optimistic and was not evaluated under a configuration that involved reordering
  • Presto’s evaluation did not test vs many other more recently proposed systems
  • FlowBender was a clever design that did the most with available resources and with performance that tracked with more costly proposals and beat ECMP by large margins
  • Did not have to rely on breaking up flows which meant less reordering overhead required
  • Did not present much in the way of a rationale for choosing comparison implementations (DeTail and RPS)

8 of 8

Sources

Eisenbud, D. (2016). Maglev: A Fast and Reliable Software Network Load Balancer. 13th USENIX

Symposium on Networked Systems Design and Implementation (NSDI 16) (pp. 523-535). Santa Clara:

USENIX Association.

He, K. (2015). Presto: Edge-based Load Balancing for Fast Datacenter Networks. SIGCOMM '15

Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

(pp. 465-478). London, United Kingdom: ACM.

Kabanni, A., & all., e. (2014). FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in

Datacenter Networks. CoNEXT '14 Proceedings of the 10th ACM International on Conference

on emerging Networking Experiments and Technologies (pp. 149-160). Syndey Australia: ACM.

Raiciu, C. (2011). Improving Datacenter Performance and Robustness with Multipath TCP. SIGCOMM '11

Proceedings of the ACM SIGCOMM 2011 conference (pp. 266-277). Toronto Ontario, Canada: ACM.

What is Layer 4 Load Balancing? (n.d.). Retrieved from NGINX:

https://www.nginx.com/resources/glossary/layer-4-load-balancing/

What is Load Balancing? (n.d.). Retrieved from NGINX:

https://www.nginx.com/resources/glossary/load-balancing/