Automating for Failure
Olubusayo Amowe�(Software Engineer)
Samson Olufuwa�(DevOps Engineer)
A bit about us
We run a not for profit (www.fuerza.africa) that enables underrepresented groups start a DevSecOps career.
Key points
Failure is Normal �
One of the key pillars of SRE success is to accept that failure is normal
What is Disaster Recovery?
Disaster Recovery involves set of policies, tools and procedures to enable the recovery or continuation of vital technology infrastructure and systems.
Disaster Recovery strategies
Backup and Data Recovery�
This describes the process of creating and storing copies of data that can be used to protect organizations against data loss.
Pilot Light�
Involves replicating part of your IT structure for a limited set of core services so that your cloud environment can take over in event of disaster
Warm - Hot standby�
scaled down version of fully functional environment is always running
Multi - Region
your infra can run on multiple regions
Some terms related to Disaster Recovery(RTO & RPO)
RTO or Recovery time objective is the maximum time your application can be offline. This usually depends on the SLAs you offer to your customers. An SLA is a promise made by you as a service provider, to your consumers, about the availability of your service and the ramifications of failing to deliver the agreed-upon level of service.
RPO or Recovery point objective is the maximum amount of time during which the data might be lost.
Typically, smaller RTO and RPO values mean that the application must recover quickly from an interruption.
Issues with using other tools for DR
How can Terraform help with disaster recovery?
Steps to enable DR with Terraform
Best Practices with Terraform DR
Considerations when using Terraform
DEMO�
Terraform config (main.tf)
data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
owners = ["099720109477"] # Canonical
}
CODE EDITOR
Terraform config (versions.tf)
data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
owners = ["099720109477"] # Canonical
}
CODE EDITOR
AWS codebuild config
(buildspec.yml)
data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
owners = ["099720109477"] # Canonical
}
CODE EDITOR
data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
owners = ["099720109477"] # Canonical
}
CODE EDITOR
RECAP
Thank you!�