1 of 10

Background

  • Kubernetes Network Custom Resource Definition De-facto Standard

https://docs.google.com/document/d/1Ny03h6IDVy_e_vmElOqR7UdTPAG_RNydhVE1Kx54kFQ/edit

  • Device Plugin Kubernetes feature
  • DPAPI, RPCs:
    • ListAndWatch()
      • Plugin returns UID of available resources, along with health status of each, to kubelet
    • Allocate()
      • Input - UID, output - mountpaths, envs, device-files

2 of 10

Admin creates control and data networks

Network object should be annotated with the “kubernetes.v1.cni.cncf.io/resourceName” annotation.

apiVersion: "kubernetes.cni.cncf.io/v1"

kind: Network

metadata:

name: n1-ctr-net

annotations:

kubernetes.v1.cni.cncf.io/resourceName: abc-plugin.io/ctr-net

spec:

plugin: abc-plugin

apiVersion: "kubernetes.cni.cncf.io/v1"

kind: Network

metadata:

name: n1-data-net

annotations:

kubernetes.v1.cni.cncf.io/resourceName: abc-plugin.io/data-net

spec:

plugin: abc-plugin

3 of 10

Network Creation Flow

1. Admin creates network

2. CRD network object gets persisted at API server

3. Plugin observes network creation

4. Using details in the network object and its own local state, Plugin advertises network availability on ListWatch DPAPI. On node1 and node2, Ex: abc-plugin.io/ctr-net

5. Kubelet updates node status for node1 and node2. abc-plugin.io/ctr-net: “1”

Api Server

Default

Scheduler

Device Manager

Node

Pods

cni-dp

Admission controllers

Kubelet

CNI

1

2

Networks

3

4

5

4 of 10

How to request network attachment

kind: Pod

metadata:

name: N1-Pod

namespace: N1-namepsace

annotations:

kubernetes.v1.cni.cncf.io/networks: ctr-net, data-net

5 of 10

NetworkResource admission controller plugin

  • New admission controller, NetworkResource, is added.
    • If “kubernetes.v1..cni.cncf.io/networks” is found in pod annotations, requested network objects are retrieved. If object not found, fail with error.
    • In the network object, if “kubernetes.v1.cni.cncf.io/resourceName” annotation key is found, value of this annotation is added in the `Resources.Limits` section in the spec of the object in the request, with a default value of 1.
    • A pod Annotation “kubernetes.v1.cni.cncf.io/contextUID” is added
  • For our example of abc-plugin, controller will add:

limits:

abc-plugin.io/data-net: “1”

abc-plugin.io/ctr-net: “1”

6 of 10

Pod Object After Admission Controller

kind: Pod

metadata:

name: N1-Pod

namespace: N1-namepsace

annotations:

kubernetes.cni.cncf.io/v1/networks: ctr-net, data-net

kubernetes.cni.cncf.io/v1/contextID: 1234-56-7890-234234-456456

spec:� containers:� - name: myapp-container� image: busybox

resources:� requests:� abc-plugin.io/data-net: “1”

abc-plugin.io/ctr-net: “1”

limits:� abc-plugin.io/data-net: “1”

abc-plugin.io/ctr-net: “1”

7 of 10

Pod Creation Flow

1. User triggers pod creation

2. NetworkResource Adm controller mutates resource request and annotates pod with a contextUID

3. Pod object entry gets created at API server

4. Scheduler decides one of the node1 or node2 for the pod, based on the Resource.Limits

5. On the node, Kubelet starts pod Admission and eventually, within Admit phase, Kubelet (Device Manager) invokes DPAPI “Allocate”, i.e RPC call to the cni-dp

Allocate(annotations, dev-id)

AllocateResponse is sent to kubelet which includes mountpaths and env variables

6. Plugin finds out context UID from pod annotations and locally stores contextUID-to-dev-id mapping

Api Server

Default

Scheduler

Device Manager

Node

Pods

Admission controllers

Kubelet

CNI

cni-dp

1

4

Networks

5

2

3

7

6

meta-plugin(ex: multus)

daemonset

cni-static-binary

Unix socket

8

9

10

8 of 10

7. CNI invokes ADD to get network configuration done for the pod. ADD/DEL will be handled by a meta-plugin. Example of meta-plugin is Multus, where role of meta plugin is:

  • To provide default network connectivity
  • To look for network CRD objects that are requested to be attached using the annotation: kubernetes.cni.cncf.io/v1/networks
  • From network CRD object figuring out how to invoke(configuration and name) cni plugin executable.
  • For example, for our example network object, cni executable name will be abc-plugin

8. Meta-plugin finally invokes abc-plugin, stateless static executable for ADD/DEL

9. Binary executable passes context-UID (annotation kubernetes.cni.cncf.io/v1/contextID) to the cni-dp plugin daemonset. This could be unix domain socket based communication.

10. cni-dp , which implements DPAPI as well, maintains contextUID-to-dev-id mappings (step 6). Cni-dp provides the interface name to the executable which is required to get pod connected to requested network.

NOTE: meta-plugin, cni plugin daemon and device plugin can be a single multi-threaded process or these can be separate processes, in that case communication mechanism will be needed between cni-plugin daemon and device plugin.

I would prefer to run cni-plugin daemon and device plugin(DPAPI server) in a single process to reduce too many moving pieces. But thats not really a goal for this discussion.

9 of 10

Pod Deletion Flow

  • At pod deletion, binary executable is invoked with DEL in a same way as it was invoked for ADD.
  • Binary executable passes contextUID to cni-dp daemon and then daemon figures out device to de-allocate using contexUID-to-dev-id mapping and completes deallocation.

10 of 10