The Right 'AIR' Mix:
Fueling High-Performance Platforms
(Work in-progress)
A - Availability
I - Intelligence
R - Resilience
Quick Intro: Building blocks of a Platform
(Type of Platform: DbaaS)
Quantifying Reliability of a Platform: Availability
Visibility interval
Level of Granularity:
Unified Fleet view
ToDo: Add more dashboards and monitoring snapshots
Perception of Availability
Availability(%) = (Successful unit of work / Total unit of work ) * 100
Computation
Per DC = (1 - sum((failed_queries{context="DC1"}[5m])) / sum((total_queries{context="DC1"}[5m]))) * 100
ToDo:
Prerequisite for calculation:
Resiliency@scale
For a multi-tenant platform operating in multi-cloud environment, resiliency strategies are bound to be case to case basis. No one size fits all solution is possible.
Complexity:
Local - complete/partial
Remote - complete/partial
Single region/ Multi-region
ToDo: Complete the scenarios identifies here
Resiliency@scale.. contd
Learnings:
Replication Topology
ToDo - Add an intuitive graphical view instead of text
Restore strategy | Deciding factors |
Customer restores the data from offline jobs |
|
Replication [Data migration through replication from other healthy DC] |
|
Data Restore using backup [Data restore through last successful backup] |
|
Backup restore followed by replication for delta data |
|
DR and Restoration strategies
Infusing Intelligence@scale
ToDo: Add Insights generation components/arch diagram/flow
Automated recommendations generated per Tenant onboarded to platform
Thank you