1 of 9

In-Place Update of Pod Resources

Current state explanation by dashpole@

Shared Publicly

2 of 9

What we know for sure:

apiVersion: v1

kind: Pod

spec:

containers:

- name: my-container

resources:

requests:

limits:

Container Resources are Mutable

3 of 9

What we know for sure:

apiVersion: v1

kind: Pod

spec:

containers:

- name: my-container

resources:

requests:

limits:

status:

containerStatuses:

resources:

requests:

limits:

Add “actual” resources to

container status

Reported from actual resources allocated by container runtime

Container Runtime

Kubelet

CRI

Status Manager

syncPod

API Server

4 of 9

Problem: Pod Requests <= Node Allocatable?

  • The kubelet and scheduler currently ensure: �sum(pod_requests) <= node_allocatable
  • Since pod requests are now mutable, we can’t guarantee this anymore.

Instead, we can now try to enforce (but can’t guarantee):

sum(pod_allocated_resources) <= node_allocatable

...where pod_allocated_resources is the resource requests in the pod status.

This means we must only perform updates to container resources when the new resource requests fit on the node...

5 of 9

Problem: When can we update container resources?

When pods are admitted today, we do a one-time check called “admission” to either allow the pod to run as-is, or fail the pod.

For resource updates, we don’t want to fail the pod, even if the new resource requests don’t fit on the node. So we want to “admit” the change, but not act on it.

The decision to act on a particular set of container resources must be:

  • Serialized with pod admissions
    • prevent race conditions with pod admission
  • Persisted immediately, and recovered on restart
    • prevent changes in pods across restarts

6 of 9

Option 1: Store “admitted” pod resources on disk

When we decide to run a pod with a particular set of resources, write those resources immediately to disk in a new checkpoint file.

When the kubelet starts, look for this checkpoint file. If it exists, load it, and somehow use those values for pod admission instead of the actual pod resources.

7 of 9

Option 1: Store “admitted” pod resources on disk

Pros:

  • No additional API changes
  • No additional calls to the API Server

Cons:

  • Admitted resources is only available locally
  • Local checkpointing is not always reliable, and is difficult to debug.

8 of 9

Option 2: Store “admitted” pod resources in spec

When we decide to run a pod with a particular set of resources, write the set of resources back into the pod spec using a pod subresource.

When the kubelet does admission of new pods with “admitted” resources already set, it admits based on the “admitted” resource rather than the pod resources.

Pod.Spec.Container[I]. Resources

Pod.Spec.Container[I].“Admitted”

Pod.Status.ContainerStatus[I].Resources

Admission

Reconciliation

9 of 9

Option 2: Store “admitted” pod resources in spec

Pros:

  • We can guarantee that sum(pod_admitted_resources) <= node_allocatable
  • External observers can determine the resources the kubelet has granted a pod in addition to the resources applied to containers.

Cons:

  • Requires an additional API change
  • Requires an additional write to the API Server when a set of resources are “admitted”.