Deploying resilient microservices on Google Cloud
Michael Mekuleyi�CTO, Sendme.ng
[Ogbomoso]
Who am I?
Table of Content
What is resilience?
What is resilience?
To understand resilience, you must understand scale and availability.
What is resilience?
The basic of resilience is adaptability.
Understanding docker and kubernetes
“It is not working on my laptop”
“It is working on my laptop ”
“We will not give the customer your laptop”
“While Docker is a container runtime, Kubernetes is a platform for running and managing containers from many container runtimes.”
“Google Kubernetes Engine (GKE) is used to implement kubernetes orchestration on Google Cloud ”
Understanding resilience
Understanding resilience
To manage resilience, you must understand mature traffic management with traffic control and traffic splitting with Kubernetes API .
We will go through a couple of thorough use cases and their solutions, both in traffic control and traffic splitting
“Traffic control (sometimes called traffic routing or traffic shaping) refers to the act of controlling where traffic goes and how it gets there ”
Understanding resilience (traffic control)
Use case: I want to protect services from getting too many requests,
Solution: Rate limiting.
Rate limiting restricts the number of requests a user can make in a given time period. Requests can include something as simple as a GET request for the homepage of a website or a POST request on a login form. When under DDoS attack, for example, you can use rate limiting to limit the incoming request rate to a value typical for real users
Understanding resilience (traffic control)
Use case: I want to avoid cascading failures
Solution: Circuit breaking
Circuit breakers prevent cascading failure by monitoring for service failures. When the number of failed requests to a service exceeds a preset threshold, the circuit breaker trips and starts returning an error response to clients as soon as the requests arrive, effectively throttling traffic away from the service
“Traffic splitting (sometimes called traffic testing) is a subcategory of traffic control and refers to the act of controlling the proportion of incoming traffic directed to different versions of a backend app running simultaneously in an environment (usually the current production version and an updated version)
”
Understanding resilience (traffic splitting)
Use case: I’m ready to test a new version in production
Solution: Debug routing
Debug routing lets you deploy it publicly yet “hide” it from actual users by allowing only certain users to access it, based on Layer 7 attributes such as a session cookie, session ID, or group ID. For example, you can allow access only to users who have an admin session cookie – their requests are routed to the new version with the credit score feature while everyone else continues on the stable version.
Understanding resilience (traffic splitting)
Use case:I need to make sure my new version is stable,
Solution: Canary deployment
A typical canary deployment starts with a high share (say, 99%) of your users on the stable version and moves a tiny group (the other 1%) to the new version. If the new version fails, for example crashing or returning errors to clients, you can immediately move the test group back to the stable version. If it succeeds, you can switch users from the stable version to the new one, either all at once or (as is more common) in a gradual, controlled migration
Understanding resilience (traffic splitting)
Use case:I want to move my users to a new version without downtime
Solution: Blue‑green deployment
Blue green deployments greatly reduce, or even eliminate, downtime for upgrades. Simply keep the old version (blue) in production while simultaneously deploying the new version (green) alongside in the same production environment
Maintaining resilence
Maintaining Resilience
Maintaining Resilience
The End
Connect with me on Twitter: monnarene