1 of 34

K8s autoscaling

2 of 34

Why we need to scale

3 of 34

4 of 34

What kind of scale we could do in K8s

  • Vertical Scale
    • Increases and decreases pod CPU and memory
  • Horizontal Scale
    • Adds and removes pods
    • Adds and removes cluster nodes

5 of 34

Resource for Scale

Resource Type: CPU and memory are each a resource type. A resource type has a base unit.

CPU resource units

Limits and requests for CPU resources are measured in cpu units. 1 CPU unit = 1000 millicpu.

Memory resource units

Limits and requests for Memory are measured in bytes. You can express memory as a plain integer or as a fixed-point number using one of these quantity suffixes: E, P, T, G, M, k.

6 of 34

Resource Example

7 of 34

How to manually Scale

8 of 34

Ways to scale pods

  • ReplicationController: Ensures that a specified number of pod replicas are running at any one time.
  • ReplicaSet: The next-generation ReplicationController that supports the new set-based label selector. It's mainly used by Deployment as a mechanism to orchestrate pod creation, deletion and updates.
  • Deployment: A higher-level API object that updates

its underlying Replica Sets and their Pods.

Example: kubectl scale deploy/application-cpu --replicas 2

9 of 34

Add new Kubernetes Worker Node Manually

Pre-requisites to bring up worker node

  • Install OS
  • Network Config
  • Disable Swap Memory
  • Install Container Runtime
  • Install Kubernetes Components: kubeadm, kubelet, kubectl

10 of 34

Add new Kubernetes Worker Node Manually

  • Step 1: Get join Token:

kubeadm token create --print-join-command

  • Step 2: Get Discovery Token CA cert Hash:

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

  • Step 3: Get API Server Advertise address

kubectl cluster-info

  • Step 4: Join a new Kubernetes Worker Node a Cluster:

kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>

11 of 34

Why we need autoscaling

12 of 34

Revisit Scale Example

Without an autoscaling solution in place, the traditional approach to mitigating such scalability failures involves:

  1. an alert (on degradation/failures)
  2. intervention by a human operator
  3. root cause analysis
  4. scaling out the number of replicas

13 of 34

Problems for Human Touch

  • There is a likely delay between the alert and the intervention.

  • Scalability failure is not the only risk to systems reliability. To identify and understand the cause of failures. This further delays any action. A bug? Network issue? DB issue?

  • It is unlikely that human operators are observing a single service that they know everything about. It is more likely that they are monitoring everything. So, when a service endures scalability failure, the human operator needs to get information on the service, like peak capacity and current load before calculating the required number of replicas to handle the current load.

14 of 34

How can autoscaling help?

  • The peak capacity of services tends to be codified instead of documented.
  • Unlike humans, the autoscaler does not need a coffee break.
  • The autoscaler concerns itself with one task and one task only (i.e.) respond to scalability triggers.

15 of 34

What kind of autoscale we could do in K8s

  • Vertical Pod Autoscaler (VPA): Increases and decreases pod CPU and memory
  • Horizontal Pod Autoscaler (HPA): Adds and removes pods
  • Cluster Autoscaler (CA): Adds and removes cluster nodes

16 of 34

Pod Level Scale

17 of 34

VPA

18 of 34

VPA

kubectl describe vpa application-cpu

vpa.yaml

deployment.yaml

19 of 34

VPA

20 of 34

VPA Limitations

  • VPA doesn’t consider network and I/O
  • VPA is not yet ready for JVM-based workloads.
  • VPA is not aware of Kubernetes cluster infrastructure variables such as node size in terms of memory and CPU

Recommend 18 GB memory, but node only have 16 GB Memory -> pod pending all the time

  • VPA won't work with HPA using the same CPU and memory metrics because it would cause a race condition
  • VPA performance has not been tested in large clusters.

21 of 34

HPA

22 of 34

HPA

TargetNumOfPods = ceil(sum(CurrentPodsCPUUtilization) / Target)

23 of 34

24 of 34

HPA Example

25 of 34

HPA Limitation

  • HPA only works for stateless applications that support running multiple instances in parallel.
  • HPA (and VPA) don’t consider IOPS, network, and storage in their calculations

26 of 34

Metrics Server

What’s it for

Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler.

What isn’t it for

  • Non-Kubernetes clusters
  • An accurate source of resource usage metrics
  • Horizontal autoscaling based on other resources than CPU/Memory

27 of 34

Metrics Server

  • cAdvisor: Daemon for collecting, aggregating and exposing container metrics included in Kubelet.
  • kubelet: Node agent for managing container resources. Resource metrics are accessible using the /metrics/resource and /stats kubelet API endpoints.
  • Summary API: API provided by the kubelet for discovering and retrieving per-node summarized stats available through the /stats endpoint.
  • metrics-server: Cluster addon component that collects and aggregates resource metrics pulled from each kubelet. The API server serves Metrics API for use by HPA, VPA, and by the kubectl top command. Metrics Server is a reference implementation of the Metrics API.
  • Metrics API: Kubernetes API supporting access to CPU and memory used for workload autoscaling. To make this work in your cluster, you need an API extension server that provides the Metrics API.

28 of 34

Node Level Scale

29 of 34

Cluster Autoscaler

30 of 34

Cluster Autoscaler Behavior

c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get nodes

NAME STATUS ROLES AGE VERSION

gke-scaling-demo-default-pool-b182e404-5l2v Ready <none> 6m v1.20.8-gke.900

c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl scale deploy/application-cpu --replicas 2

deployment.apps/application-cpu scaled

c444e24e3915@cloudshell:~ (qwiklabs-gcp-03-a94f05d7b8a0)$ kubectl get pods

NAME READY STATUS RESTARTS AGE

application-cpu-7879778795-8t8bn 1/1 Running 0 2m29s

application-cpu-7879778795-rzxc7 0/1 Pending 0 5s

31 of 34

Cluster Autoscaler Behavior

Scale Up:

Pod Event

Node Event

32 of 34

Cluster Autoscaler Behavior

ScaleDown:node removed by cluster autoscaler

NodeNotReady

Deleting node

RemovingNode:Removing Node from Controller.

Scale Down:

33 of 34

Cluster Autoscaler Limitations

  • Cluster Autoscaler is not supported on on-premise environments until an autoscaler is implemented for on-premise deployments.
  • Only support < 1000 nodes.
  • Scaling up is not immediate. Therefore, a pod will be in a pending state for a few minutes until a new worker is added.
  • Cluster Autoscaler works based on resource requests, not actual usage.

34 of 34

Reference