What does “production ready” really mean for a Kubernetes cluster?
Lucas Käldström - CNCF Ambassador
7th of May, 2019 - Umeå
Image credit: @ashleymcnamara
1
$ whoami
Lucas Käldström, High School Student, 19 years old�
CNCF Ambassador, Certified Kubernetes Administrator and Kubernetes SIG Lead�
KubeCon Speaker in Berlin, Austin,
Copenhagen, Shanghai & Seattle�
Kubernetes Approver and Subproject Owner, active in the community for ~3 years. Got kubeadm to GA.�
Driving luxas labs which currently performs contracting for Weaveworks�
A guy that has never attended a computing class
2
Agenda
3
Agenda
4
Agenda
5
Which layer are you talking about?
Master A
Master N
Node 1
Node N
Kubernetes cluster
Machines
Application A
Application B
App C
App D
App E
Applications
Focusing on�this layer
6
Buzzwords all around...
7
“The cluster is production ready�when it is in a good enough shape �for the user to serve real-world traffic”
8
“Your offering is production ready when it�slightly exceeds your customer’s expectations�in a way that allows for business growth”
-- Carter Morgan, Google (@_askcarter)
9
It’s all about tradeoffs (!!)
10
Okay, so what does this mean�in terms of technical work items?
11
Production-ready cluster?
12
Kubernetes’ high-level component architecture
Nodes
Master
Node 3
OS
Container
Runtime
Kubelet
Networking
Node 2
OS
Container
Runtime
Kubelet
Networking
Node 1
OS
Container
Runtime
Kubelet
Networking
API Server (REST API)
Controller Manager
(Controller Loops)
Scheduler
(Bind Pod to Node)
etcd (key-value DB, SSOT)
User
13
What about “high availability”?
More about this in section III.
14
Things to keep in mind
15
16
17
18
19
20
Setting up a dynamic TLS-secured cluster
Nodes
Control Plane
API Server
Controller Manager
Scheduler
CN=system:kube-controller-manager
CN=system:kube-scheduler
Kubelet: node-1
HTTPS (6443)
Kubelet client
O=system:masters
Self-signed HTTPS (10250)
CN=system:node:node-1
O=system:nodes
Kubelet: node-2 (to be joined)
Self-signed HTTPS (10250)
Bootstrap Token & trusted CA
CN=system:node:node-2
O=system:nodes
CSR Approver
CSR Signer
Legend:
Logs / Exec calls
Normal HTTPS
POST CSR
SAR Webhook
PATCH CSR
node-1 CSR
node-2 CSR
Bootstrap Token
CSR=Certificate Signing Request, SAR=Subject Access Review
21
More information about Kubernetes security
22
Proactively avoid disasters
23
kubeadm
Master 1
Master N
Node 1
Node N
kubeadm
kubeadm
kubeadm
kubeadm
Cloud Provider
Load Balancers
Monitoring
Logging
Cluster API Spec
Cluster API
Cluster API Implementation
Addons
Kubernetes API
Bootstrapping
Machines
Infrastructure
= The official tool to bootstrap a minimum viable, best-practice Kubernetes cluster
Layer 2
kubeadm
Layer 3
Addon Operators
Layer 1
Cluster API
24
How achieve HA with kubeadm?
HA etcd cluster
External Load Balancer or DNS-based API server resolving
Master A (kubeadm init)
API Server
Controller Manager
Scheduler
Shared certificates
etcd
etcd
etcd
Master B (kubeadm init)
API Server
Controller Manager
Scheduler
Shared certificates
Master C (kubeadm init)
API Server
Controller Manager
Scheduler
Shared certificates
Nodes (kubeadm join)
Kubelet 1
Kubelet 2
Kubelet 3
Kubelet 4
Kubelet 5
Do-it-yourself
25
Is this cluster setup highly-available?
No
HA etcd cluster
Master A
API Server
Controller Manager
Scheduler
Shared certificates
etcd
etcd
etcd
Master B
API Server
Controller Manager
Scheduler
Shared certificates
Master C
API Server
Controller Manager
Scheduler
Shared certificates
Nodes
Kubelet 1
Kubelet 2
Kubelet 3
Kubelet 4
Kubelet 5
Master D
Loadbalancer
Single point of failure :(
26
Other things to keep in mind with a HA cluster
27
“Monitor it so you know when it fails�before your customers do”
-- Justin Santa Barbara, Google (@justinsb)
28
Manage clusters like applications
29
Cluster API
30
“GitOps” for your cluster(s)
apiVersion: cluster.k8s.io/v1alpha1�kind: MachineDeployment�metadata:� name: my-nodes�spec:� replicas: 3� selector:� matchLabels:� foo: bar� template:� metadata:� labels:� foo: bar� spec:� providerConfig:� value:� apiVersion: "baremetalconfig/v1alpha1"� kind: "BareMetalProviderConfig"� zone: "us-central1-f"� machineType: "n1-standard-1"� image: "ubuntu-1604-lts"� versions:� kubelet: 1.14.2� containerRuntime:� name: containerd� version: 1.2.0 |
31
For enhanced insight and functionality
32
Cloud Native Trail Map
Trail Map: l.cncf.io
33
Choose your runtime & registry
Docker is the most common runtime, but you could consider using containerd (Graduated) or cri-o (Incubating) instead for less footprint and attack area.
Also, an internal container image registry might be needed. Harbor can set up a scalable registry for you on Kubernetes.
34
Monitoring the cluster
Now that the cluster is up and running, let’s start monitoring it. As a good starting point, you can use the prometheus-operator Helm Chart.
That gives you a Prometheus instance running in Kubernetes, good preset rules for monitoring (kube-state-metrics), and Grafana dashboards for visualization.
35
Enable Fluent Bit for logging
In order to store container logs for a long period of time, you need to enable a log forwarder from the container runtime to some kind of logging aggregation service like ElasticSearch.
You can use the fluent-bit-kubernetes-logging project as a good starting point for this task. Bonus points for also aggregating the Audit Logs
36
Enable cloud/environment extensions
What’s traditionally called Cloud Providers for Kubernetes; handles Node creation/deletion with the environment, and Type=LoadBalancer Services, and optional other features.
Anyone can create a so-called Cloud Provider integration for their environment. Example to the right.
37
Set up an Ingress controller
In order to expose your Services to the outer world, you need some kind of 3rd-party Ingress Controller.
Ingress Controllers makes your Ingress objects in Kubernetes work. You might want the controller itself to be a Type=LoadBalancer Service.
The ones you could look out for are Traefik, Nginx Ingress, and Contour.
38
Persistent Storage is key
Lastly, you most likely need Persistent Storage for many of your applications. Kubernetes supports the Container Storage Interface (CSI) for providers to implement.
Rook implements various types of clustered storage in a Kubernetes-native way. Alternatively, you can use your cloud provider’s solution.
39
Recap
40
Thank you!
41
Related resources (in no particular order)
42