1 of 15

Networking & Security for Containers with BPF & XDP

Docker Distributed Systems Summit

Thomas Graf

2 of 15

The Network becomes the Application bus

We have to deal with networks that ...

    • contain millions of endpoints
    • are noisy (nMpps)
    • are insecure with multiple tenants
    • operate unreliably
    • are constantly evolving WRT protocols

3 of 15

Cilium Architecture

4 of 15

What is BPF?

5 of 15

BPF Code Generation at Container Startup

  • Generate networking code at container startup
    • Tailored to each individual container
    • Leads to minimal code required�⇒ faster�⇒ smaller attack surface (unikernel like)
  • Majority of configuration (IP, MAC, ports, ... ) becomes constant, the compiler can optimize heavily
  • Regeneration at runtime without breaking connections

6 of 15

Make all tasks globally addressable on the Internet!

  • Global IPv6 addresses
    • No NAT!
    • Native IPv4/NAT46 + NAT for compat
  • Host scope address allocator
    • Lockless allocation
  • Task mobility
    • ILA

7 of 15

Scaling Policy Specification

  • How to specify policy for millions of endpoints?
  • Decouple policy specification from addressing
    • IP+port ACLs are unsuitable for containers
    • Policy specification based on container labels

Frontend

Backend

LB

FE

BE

LB

LB

FE

FE

BE

LB

8 of 15

Scaling Policy Specification

  • How to specify policy for millions of endpoints?
  • Decouple policy specification from addressing
    • IP+port ACLs are unsuitable for containers
    • Policy specification based on container labels

Frontend

Backend

LB

FE

BE

LB

LB

FE

FE

BE

LB

Prod

Frontend

Backend

LB

FE

BE

LB

QA

Prod

QA

Prod

requires

requires

QA

QA

9 of 15

Scaling Policy Enforcement

  • Distributed fixed cost policy enforcement
    • Per-CPU BPF-map hashtable

FE

BE

LB

Prod

QA

Prod

Prod

FE

BE

LB

QA

QA

10

11

12

13

14

15

16

Cluster Wide Label ID Table:

This ID is carried in the network packet and used to reconstruct the label context at the receiving host.

Policy enforcement cost is reduced to a single hashtable lookup regardless of complexity.

10 of 15

Safety & Extensibility in the Kernel

  • Safety guaranteed by Verifier
    • Protocol parser bug will not allow someone to remote kill your entire datacenter.
  • Decouple datapath functionality from kernel version
    • Support new protocols
    • Add arbitrary statistics
  • All at runtime for already running containers

11 of 15

Scaling the Delivery of Cat Videos

  • Distributed L3/L4 LB w/ Direct-Server-Return
  • Like IPVS but completely programmable
  • LB for N-S, E-W, intra-node

FE

BE

LB

LB

ECMP

FE

FE

BE

BE

BE

Small HTTP GET�

Ultra HD Cat Pictures/Videos

12 of 15

Performance

Intel Xeon 3.5Ghz Sandy Bridge, 24 cores,

1 TCP flow per core, netperf -t TCP_SENDFILE, 10’000 policies

13 of 15

Demo

14 of 15

Q&A

Start hacking on BPF for containers:�https://github.com/cilium/cilium��Slack: Twitter�cilium.slack.com @tgraf__

15 of 15

Building Blocks

  • L3 forwarding (IPv6 & IPv4)
  • Host connectivity
  • Encapsulation (VXLAN/Geneve/GRE)
  • ICMPv6 & ICMP generation
  • NDisc & ARP responder
  • Access Control
  • Port mapping
  • Connection tracking
  • L3/L4 Load balancer w/ DSR
  • Statistics
  • Events (perf ring buffer)
  • Debugging framework
  • NAT46