2 of 9

What we know for sure:

apiVersion: v1

kind: Pod

spec:

containers:

- name: my-container

resources:

requests:

limits:

Container Resources are Mutable

3 of 9

What we know for sure:

apiVersion: v1

kind: Pod

spec:

containers:

- name: my-container

resources:

requests:

limits:

status:

containerStatuses:

resources:

requests:

limits:

Add “actual” resources to

container status

Reported from actual resources allocated by container runtime

Container Runtime

Kubelet

CRI

Status Manager

syncPod

API Server

4 of 9

Problem: Pod Requests <= Node Allocatable?

The kubelet and scheduler currently ensure: �sum(pod_requests) <= node_allocatable
Since pod requests are now mutable, we can’t guarantee this anymore.

Instead, we can now try to enforce (but can’t guarantee):

sum(pod_allocated_resources) <= node_allocatable

...where pod_allocated_resources is the resource requests in the pod status.

This means we must only perform updates to container resources when the new resource requests fit on the node...

5 of 9

Problem: When can we update container resources?

When pods are admitted today, we do a one-time check called “admission” to either allow the pod to run as-is, or fail the pod.

For resource updates, we don’t want to fail the pod, even if the new resource requests don’t fit on the node. So we want to “admit” the change, but not act on it.

The decision to act on a particular set of container resources must be:

Serialized with pod admissions

prevent race conditions with pod admission

Persisted immediately, and recovered on restart

prevent changes in pods across restarts

6 of 9

Option 1: Store “admitted” pod resources on disk

When we decide to run a pod with a particular set of resources, write those resources immediately to disk in a new checkpoint file.

When the kubelet starts, look for this checkpoint file. If it exists, load it, and somehow use those values for pod admission instead of the actual pod resources.

7 of 9

Option 1: Store “admitted” pod resources on disk

Pros:

No additional API changes
No additional calls to the API Server

Cons:

Admitted resources is only available locally
Local checkpointing is not always reliable, and is difficult to debug.

8 of 9

Option 2: Store “admitted” pod resources in spec

When we decide to run a pod with a particular set of resources, write the set of resources back into the pod spec using a pod subresource.

When the kubelet does admission of new pods with “admitted” resources already set, it admits based on the “admitted” resource rather than the pod resources.

Pod.Spec.Container[I]. Resources

Pod.Spec.Container[I].“Admitted”

Pod.Status.ContainerStatus[I].Resources

Admission

Reconciliation

9 of 9

Option 2: Store “admitted” pod resources in spec

Pros:

We can guarantee that sum(pod_admitted_resources) <= node_allocatable
External observers can determine the resources the kubelet has granted a pod in addition to the resources applied to containers.

Cons:

Requires an additional API change
Requires an additional write to the API Server when a set of resources are “admitted”.

1 of 9

2 of 9

3 of 9

4 of 9

5 of 9

6 of 9

7 of 9

8 of 9

9 of 9