1 of 14

Confidential Containers

Split host/tenant APIs

2 of 14

Starting point: Assume we have PullImage

kubelet

CRIO

containerd

kata-shim-v2

Linux Kernel

KVM

VMM

(Cloud-Hypervisor/QEMU)

Confidential VM

initrd

kernel

kata-agent

Confidential Computing Platform (TDX, SEV, etc)

1 - PullImage

Container

Images

Storage

Container

Images

Storage

Container

Image

Registry

Relying Party

Key Broker

Attestation

Service

2 - PullImage

3 - Download

🔑

4 - GetKey

5 - GetAttestation

6 - SendAttestation

7 - SendKey

8 - Decrypt -

Unpack - Mount

firmware

measured

Color code:

Green: Tenant-owned

Yellow: Host-owned

3 of 14

Problem: Host vs. Tenant realms

Trusted

Trusted

Untrustedsadasd

Untrusted

kubelet

CRIO

containerd

kata-shim-v2

Linux Kernel

KVM

VMM

(Cloud-Hypervisor/QEMU)

Confidential VM

initrd

kernel

kata-agent

Confidential Computing Platform (TDX, SEV, etc)

1 - PullImage

Container

Images

Storage

Container

Images

Storage

2 - PullImage

3 - PullImage

Container

Image

Registry

Relying Party

Key Broker

Attestation

Service

4 - PullImage

5 - Download

🔑

7 - GetKey

8 - GetAttestation

9 - SendAttestation

10 - SendKey

11 - Decrypt -

Unpack - Mount

6 - Image digest+metadata

Whole volume encryption needed

Q: Why are there green arrows in the orange domain?

4 of 14

Objectives

  • Offer full-featured access to confidential containers…
    • How do you read logs?
    • How do you “exec” something?
    • How do you get metrics?
  • … while preserving confidentiality
    • The host should not see that data (at least not in cleartext)
    • Access rights may be different, different users, different secrets
  • … in an incremental way
    • Reuse as much of the existing infrastructure as possible
    • Preserve the existing semantics and tools
    • Take advantage of known efforts, e.g. Sandbox API, Hypershift, kcp, etc…

5 of 14

Step 1: Blocking APIs

Reject APIs that present a security risk

Confidential VM

initrd

kernel

kata-agent

VMM

(Cloud-Hypervisor/QEMU)

kubelet

CRIO

containerd

kata-shim-v3

etcd

Host user with host credentials

vsock

Agent checks if API is valid, based on in-image, attested configuration file.

But how do you implement kubectl logs or kubectl exec?

6 of 14

Step 2: Expose tenant APIs

Hey, we have a secure channel already!

Confidential VM

initrd

kernel

kata-agent

VMM

(Cloud-Hypervisor/QEMU)

kubelet

CRIO

containerd

kata-shim-v3

Host user with host credentials

vsock

But how do you talk to the agent over this new channel?

Relying Party

Key Broker

Attestation

Service

Some other API source

Agent had to establish secure connexion before anything else…

etcd

Agent could also accepts API calls from somewhere in the trusted realm (same channel or similar one)

7 of 14

Step 3: Prototype with a tenant-only stack

Different set of secrets (different etcd), might run on different host

Confidential VM

initrd

kernel

kata-agent

VMM

(Cloud-Hypervisor/QEMU)

kubelet

CRIO

containerd

kata-shim-v3

Host user with host credentials

vsock

Do we really need forwarding?

Do we still need vsocks?

Relying Party

Key Broker

Attestation

Service

API forwarding service

kubelet

CRIO

containerd

kata-shim-v3

etcd

Tenant user with tenant credentials

etcd

8 of 14

Step 3b: Variant using encrypted API over vsock

Talking at some other level of the stack

Confidential VM

initrd

kernel

kata-agent

VMM

(Cloud-Hypervisor/QEMU)

kubelet

CRIO

containerd

kata-shim-v3

Host user with host credentials

vsock

Run on other host?

Custom crypto?

Relying Party

Key Broker

Attestation

Service

kubelet

CRIO

containerd

etcd

Tenant user with tenant credentials

etcd

9 of 14

Step 4: Locked-down / immutable pods

Take advantage of Sandbox API effort

Confidential VM

initrd

kernel

kata-agent

VMM

(Cloud-Hypervisor/QEMU)

kubelet

CRIO

containerd

kata-shim-v3

kubelet

CRIO

containerd

kata-shim-v3

Host user with host credentials

No vsock

Tenant user with tenant credentials

Secure (networked) RPC channel

etcd

Create Pod with complete Pod description (not piecewise)

Practical all APIs except pod lifetime (create, kill) go through confidential channel

etcd

10 of 14

Step 5: Dual-secrets user-space command split

The first (only?) major non-Kata change required (but can be manual)

Confidential VM

initrd

kernel

kata-agent

VMM

(Cloud-Hypervisor/QEMU)

kubelet

CRIO

containerd

kata-shim-v3

kubelet

CRIO

containerd

kata-shim-v3

Host user with host credentials

No vsock

Tenant user with tenant credentials

etcd

etcd

Cluster user:�kubectl $blah $args

kubectl create…�kubectl delete…

kubectl exec…�kubectl logs…

11 of 14

Step 6 (long term): Simplify the control plane

Leverage existing efforts: Hypershift, kcp

Confidential VM

initrd

kernel

kata-agent

VMM

(Cloud-Hypervisor/QEMU)

kubelet

CRIO

containerd

kata-shim-v3

kata-shim-v3

Host user with host credentials

No vsock

Tenant user with tenant credentials

etcd

Cluster user:�kubectl $blah $args

kubectl create…�kubectl delete…

kubectl exec…�kubectl logs…

magic-smoke

(TBD)

12 of 14

HyperShift and kcp

  • Hypershift: Run multiple clusters with one control plane
    • Run the control plane on some separate nodes
    • That would be something that belongs to the tenant

https://github.com/openshift/hypershift

  • Kcp: Minimalist Kubernetes API server
    • No pods? No nodes? Hey, that’s exactly what I need!
    • Well… not quite, but it’s a good starting point

https://github.com/kcp-dev/kcp

13 of 14

Long term view: Host / Tenant Split APIs

Roughly the same APIs as today, but along two paths

Attest boot image

Provide / attest workloads

Start container (+exec, debug)

Provide secrets (e.g. disk keys)

Access workload stdio

Access workload logs

Get workload metrics

Confidential VM

initrd

kernel

kata-agent

VMM

(Cloud-Hypervisor/QEMU)

Host user with host credentials

No vsock

Tenant user with tenant credentials

Create and destroy pods

Allocate physical resources

Access hypervisor logs

Get global metrics (cgroup)

Linux Kernel

KVM

Problem: who splits the specs?

E.g. image name

14 of 14

Open issues

  • Ordering and splitting into small enough chunks
  • Are there APIs that cannot be split that way?
    • Example: Pod creation is what sets up stdio today, need to move it to sandbox
  • Can we use annotations until the Sandbox API is ready?
  • Is the Sandbox API correctly defined for that use case?
  • Who splits the specs / yaml files?
    • Shouldn’t attestation be able to reject an image (even if it was accepted before)?
    • Mount points: who owns which part of which path?
    • I don’t do that, but others pass secrets through environment variables…
    • …if not straight in the YAML file itself