Quark Hibernate Container
Yulin Sun, Shaobao Feng
Serverless Container Requirement/Challenge
Hibernate Container
| Memory Consumption | CPU Usage | Startup latency | Deployment Density |
Warm up | Sandbox + User Application | Unknown | Zero | Low |
Running | Sandbox + User Application | Yes | N/A | N/A |
Hibernate | Sandbox only (~ 5 MB) | Zero | Low | High |
Init | Zero | Zero | High | N/A |
Demo/Performance
Memory Usage (RSS) | NodeJs | Nginx |
Warm-up | 77.779 MB | 33.872MB |
Hibernate | 13.664 MB | 14.396 MB |
Wakeup from Hibernate | 37.328 MB | 15.512 MB |
Private Mem | 4.9 MB | 5.3 MB |
| Nodejs | Nginx |
qkernel.bin | 2456 | 2340 |
quark | 412 | 412 |
| 3652 | 4048 |
| 324 | 268 |
52 | 52 | |
| 288 | 252 |
24 | 24 | |
| 68 | 68 |
| 24 | 24 |
libgcc_s.so.1 | 12 | 12 |
| 0 | 64 |
136 | 136 | |
| 976 | 1028 |
| 184 | 172 |
140 | 140 | |
| 32 | 32 |
total | 8780 | 9072 |
Startup latency | NodeJs | Nginx | |
Warm Up Container | 1st Req | 9 ~ 13 ms | 2 ~ 3 ms |
other Req | 1 ~ 5 ms | 0.4 ~ 4 ms | |
Hibernate Container (No PageCache) | 1st Req | 12~65 ms | 10 ~ 13 ms |
other Req | 1 ~ 5 ms | 0.4 ~ 4 ms | |
Hibernate Container (with PageCache) | 1st Req | 4 ~ 20 ms | 1 ~ 2 ms |
other Req | 1 ~ 5 ms | 0.4 ~ 4 ms | |
WorkSet | 2256 Pages | 154 Pages |
N x Private Mem + 1 x Shared Mem
Node.js deployment density limit
Machine Spec
Application
Running Container
Maximum Hibernate Container
Next step work
Goal: Not replacement of warm startup but another option
M x Warm startup + N Hibernate
A New Serverless Application Deploy Mode
kubectl create -f app.yaml
A Knative AutoScale Example
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: autoscale-go
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
autoscaling.knative.dev/metric: concurrency
autoscaling.knative.dev/target: "10"
autoscaling.knative.dev/min-scale: "0"
autoscaling.knative.dev/max-scale: "100"
autoscaling.knative.dev/max-hibernete: "18"
autoscaling.knative.dev/min-hibernete: "3"
spec:
containers:
- image: gcr.io/knative-samples/autoscale-go:0.1
Manual Hibernating of pods
The Challenge of kubernetes
Init
Warm
up
Running
Hibernate
Cold Start
User Request
Request Finish
SIGSTOP
SIGCONT
User Request
Front End
Controller
Worker Node
Node Agent
Container
Container
…
Request
Control Msg
Control data
Msg
Bus
Guest Applications
Page Tables
Bitmap Allocator
QKernel
Swapping Mgr
Mem Reclaim Mgr
QVisor
Host Linux Kernel
Swap File
REAP File
Full
Swap Out
Page Fault
Swap-in
REAP
Batch Swap-In
Init
Warm
Running
Hibernate
① Cold Start
②User Request
③Request Finish
④ SIGSTOP
⑤SIGCONT
⑦ User Request
Wake-up
⑨ SIGSTOP
Hibernate Running
⑧ Request Finish
⑥ User Request
①②③④⑤⑥⑦⑧⑨
4MB Mem Block
Control
Page
4KB
Page#1
4KB Page#1023
Control Page
Next
Free Page Bitmap
Refcnt Array
AtomicU16#0
AtomicU16#1
AtomicU16#2
AtomicU16#1023
4KB
Page#2
…
…
L2 Bitmap Array
u64#0
u64#1
Bit 0 ~ 63
Bit 64 ~ 127
u64#15
…
Bit 960 ~ 1023
Bit#0
Bit#1
Bit#15
Bit#63
…
L1 Bitmap (u64)
Free Page Bitmap
…
Bitmap Page Allocator
Head
Tail
4MB Mem Block
Next
4MB Mem Block
Next
4MB Mem Block
Next
…
Linux Process
Linux Container Applications
Linux Guest Kernel
QEMU / Firecracker …
Linux Container (cgroup, namespaces)
Linux Container Applications
QKernel
QVisor
Memory Management
Process Management
Network Stack
Virtual File System
Linux Host Kernel
KVM
Guest Applications
Guest Kernel
Virtual Machine Monitor (VMM)
Quark
Linux Virtual Machine
Host Kernel
System Call Virtualization Layer
Linux System Call
Quark System Call
TSoR Cluster
External
RDMA Conn
Node#2
Node#4
Node#3
Node#1
…
Cluster Node
RDMA Service
TSoR Gateway
Quark Pod#1
…
RDMA NIC
TCP Egress
TCP Ingress
TCP Ingress
TCP Egress
Orchestration Control Plane
Cluster Orchestration System
TSoR Client
Quark Pod#N
TSoR Client
…
Kubernetes Cluster
Node1
Pod#2
Pod#1
Node 2
Pod#3
External
…
①②③④⑤⑥
①
②
③
④
RDMA Service
Pod#1
SHM Region
TSoR Client
System Call Virtual Layer
Cloud Native Applications
RDMA Srv Client Mgr
RDMA Connection Mgr
RNIC
SQ
CQ
Shared
Data Buffs
Pod#1
SHM Region
TSoR Client
System Call Virtual Layer
Cloud Native Applications
…
TSoR Control Plane Agent
Orchestration Control Plane
TSoR Gateway
TSOR Client
Shared
Data Buffs
Guest User Space
Guest Kernel Space
RDMA Channel
RDMA Service
RDMA Connection Mgr
RDMA Service
RDMA Connection Mgr
RDMA Connection
RDMA Channel (Control)
RDMA Channel (Data) #2
RDMA Channel (Data) #1
…
Write Ring Buffer
Write Ring Buffer
read Ring Buffer
read Ring Buffer
Node#1
Node#2
TSoR Gateway
Gateway Control Plane Agent
TSoR Client
Ingress TCP Layer
Egress TCP Layer
External
TCP Ingress Traffic
TCP Egress Traffic
RDMA Service
Orchestration Control Plane
Quark Pod
TSoR Ingress Traffic
TSoR Egress Traffic
RDMA NIC
Cluster Node#1 (192.168.0.0/24)
RDMA Service
…
Pod#1
192.168.0.1
Client1
Pod#2
192.168.0.2
Client2
TSoR (Egress) Gateway
Client3
TSoR (Ingress) Gateway
Client4
Cluster Node#2
(192.168.1.0/24)
RDMA Conn#1
RDMA Conn#2
Dst | Interface |
192.168.0.1 | Client#1 (Local) |
192.168.0.2 | Client#2 (Local) |
192.168.1.0/24 | RDMA Conn#1 |
192.168.2.0/24 | RDMA Conn#2 |
192.168.3.0/24 | RDMA Conn#3 |
10.5.0.0/16 | Cluster IP handler |
* | Client#4 (Egress) |
TSoR Route Table
RDMA Conn#3
Pod#3 192.168.1.5
Cluster Node#3
(192.168.2.0/24)
Pod#4 192.168.2.8
Cluster Node#4
(192.168.3.0/24)
Pod#5 192.168.3.6
Cluster IP | Pod IP |
Service svc1 10.5.6.8:546 | 192.168.1.5:80 |
192.168.2.8:80 | |
192.168.3.6:80 |
TSoR Cluster IP Table
External
External EP | Internal EP |
202.21.11.5:80 | 10.5.6.8:546 |
TSoR Ingress Gateway Table
External IP
202.21.11.5
…
26
TCP Socket
Container
write(int socket, void *buf, ssize_t N))
TCP Socket
Container
read(int socket, void *buf, ssize_t N))
Write Buffer
SysWrite
Read Buffer
SysRead
Submit Queue
Push Request
Pop Request
RDMA QP
SQ
CQ
Handle Request
Complete Queue
RDMA QP
Handle Request
SQ
CQ
RDMA Write Immediate
TSoR_cli
share region
rdma_svc
rdma_svc
share region
TSoR_cli
Node
Sandbox
Process
TCP Conn
Node
Sandbox
Process
TCP Conn
http://projectid.xxx/
API Gateway
API Gateway
…
Function Dispatcher
Function Dispatcher
…
Container#1
Container#2
Resource Manager
Node Agent
Event Queue
Ingress
Dispatcher
Region 1
Region 2
Pod A
Pod B
Pod B
Egress
Ingress
192.168.0.1:80
192.168.2.3:80
192.168.5.2:128
Pod B
Pod B
EIP 202.1.2.3:90
Endpoint B (Service B) Cluster IP: 10.5.8.5:80 |
192.168.0.1.80 |
192.168.2.3.80 |
192.168.5.2.128 |
Endpoint B (Service B) Cluster IP 10.5.6.7:8080 |
192.168.0.1.8080 |
192.168.82.3.8080 |
192.168.0.1:8080
192.168.82.3:8080
Port | External EP |
128 | 202.1.2.3:90 |
68.5.4.3:507 | |
129 | .. |
Region 3
Ingress
Pod B
Pod B
EIP 68.5.4.3:507
192.168.0.1:8080
192.168.82.3:8080
Egress Mapping
Port | Internal EP |
90 | 10.5.6.7:8080 |
| … |
Ingress Mapping
Service Definition
Service Definition
Global K8S Mgr
Adaptor C
S3/Redis
Egress
Ingress
DNS:
PodB.local → 10.5.6.7
DNS
PodB.local → 10.5.8.5
Object Store
VPC
Pod A