PLATFORM ENGINEERING AND THE FUTURE OF INTERNAL PLATFORM PRODUCTS
Hossein Salahi | Lead Platform Engineer
Max Körbächer | Founder & Cloud Native Advisor
WHERE DEVOPS FAILS?
WHERE DEVOPS FAILS?
Core Principles and Promises of DevOps
Collaboration
Automation
Continuous
Improvements
Observability
Feedback Loops
Break Down Silos
Foster a communication
Culture
Cross-Functional Teams
Organizational-wide engagement
Centrally collect and react on events, provide guard rails and intrusion detection
Infrastructure as Code
Continuous Integration
Continuous Delivery
Real-time Monitoring
Both application and infrastructure
Detect and address proactively
A communication channel between ops and dev
Analyze success and failure of deployments
WHERE DEVOPS FAILS?
Core Reasons DevOps Initiatives Fail
Lack of vision
Unrealistic
Expectations
Building DevOps Team
Adaption
Approaches
Maintaining Old Structures
Unclear strategic planning and business value
Desired results in terms of time and resources
Elimination of bottlenecks
Breaking down silos
But, yet another silo!
Failing to apply it organization wide
A Faster build, test, and release process
A Secure infrastructure
Setting multi-cloud strategies
without understanding the problem
A cultural shift
A mindset rather than tools, speed, or
applications
Cultural pushback
Building a hybrid structure
Keeping ops and dev in their silos
https://www.innoq.com/en/articles/2023/08/is-platform-engineering-the-new-devops/#discoverabilityandownership
WHY DEVOPS FAILS?
Why may development teams not adopt it?
PLATFORM�ENGINEERING
Definition
A discipline of designing and building toolchains and workflows
A layer abstraction to simplify underlying infrastructure provisioning and management
A standardized, automated, scalable environments
More efficient application delivery
Self-service capabilities for development teams
https://www.innoq.com/en/articles/2023/08/is-platform-engineering-the-new-devops/#discoverabilityandownership
https://blog.joshgav.com/posts/kubecon-platforms-review
PLATFORM ENGINEERING RESPONSIBILITIES
Self-Service Capabilities
Observability
Collaboration and Feedback
Standardization
Security and Compliance
Infrastructure Automation
Scalability and Resilience
Workflow Orchestration
CRAFT YOUR OWN PLATFORMS
“BEFORE YOU CAN BUILD A PLATFORM YOU NEED TO KNOW WHAT WILL BE ITS PURPOSE”
PLATFORM LAYERS
Consist of multiple components that interact in different speed with each other
Developer Control Plane
Integration & Delivery Plane
Security & Compliance Plane
Observability & Operability Plane
Resource Plane
Consist of any Dev (Software / Platform) required tools for coding & versioning
Usually a cloud provider, IaaS or hypervisor
Centrally collect and react on events, provide guard rails and intrusion detection
Tooling provided to operate the platform, and in some cases also the application
Features which the platform provides for the users in lives “within” the platform
Capability Plane
Build, test & deploy software and components
IDP REFERENCE
14
Developer Control Plane
IDE
Service Catalog / API Catalog Developer Portal
Version control
Application Source Code
Platform Source Code
Buildpacks
Terraform
GitHub
Workloads
Automations
Backstage
VSCode
GitHub Codespace
Gitpod
IDP REFERENCE
15
Integration and
Delivery Plane
CI Pipeline
Registry
CD Pipeline
GitHub�Actions
Amazon�ECR
Argo CD
GitHub Actions
Harbor
IDP REFERENCE
16
Observability & Operations Plane
Observability
Prometheus
Grafana
Zipkin
Dynatrace
IDP REFERENCE
17
Security & Compliance Plane
Secrets & Identity Manager
HCP Vault
Keycloak
Prisma
IDP REFERENCE
18
Resource Plane
Compute
Data
Networking
Services
Amazon�EKS
RSD�MySQL
Route 53
Amazon�SQS
IDP REFERENCE
19
Developer Control Plane
Integration and
Delivery Plane
Observability & Operations Plane
Security & Compliance Plane
IDE
Service Catalog / API Catalog Developer Portal
Version control
Application Source Code
Platform Source Code
Observability
Secrets & Identity Manager
CI Pipeline
Registry
CD Pipeline
Resource Plane
Compute
Data
Networking
Services
Buildpacks
Terraform
GitHub
GitHub�Actions
Amazon�ECR
HCP Vault
Amazon�EKS
RSD�MySQL
Route 53
Amazon�SQS
Workloads
Automations
Backstage
Argo CD
GitHub Actions
VSCode
GitHub Codespace
Gitpod
Harbor
Prometheus
Grafana
Zipkin
Keycloak
Prisma
Dynatrace
Source: platformengineering.org
IDP REFERENCE
20
Capability Plane
Resource Integration
Security & Compliance
Observability
User Space
Prometheus
Scale & Schedule
Network
GPU Device Plugin
Heavy Babies
21
Capability Plane
Resource Integration
Security & Compliance
Observability
User Space
Prometheus
Scale & Schedule
Network
GPU Device Plugin
Heavy Babies
INFLUENCING
Hardware and software effects eachother
Resource Availability
Startup
- Virtualization
Portability
-
Container
App Scaleability
Infrastructure
Cloud Native Application
INFLUENCING
Hardware and software effects eachother
Resource Availability
Startup
- Virtualization
Portability
-
Container
App Scaleability
Infrastructure
Cloud Native Application
PLATFORMS NEEDS TO PROVIDE THE CAPABILITIES TO ADJUST TO ANY WORKLOAD WITHOUT HARMING THE UNDERLAYING INFRASTRUCTURE
IDP REFERENCE
24
Developer Control Plane
Integration and
Delivery Plane
Observability & Operations Plane
Security & Compliance Plane
IDE
Service Catalog / API Catalog Developer Portal
Version control
Application Source Code
Platform Source Code
Observability
Secrets & Identity Manager
CI Pipeline
Registry
CD Pipeline
Resource Plane
Compute
Data
Networking
Services
Buildpacks
Terraform
GitHub
GitHub�Actions
Amazon�ECR
HCP Vault
Amazon�EKS
RSD�MySQL
Route 53
Amazon�SQS
Workloads
Automations
Backstage
Argo CD
GitHub Actions
VSCode
GitHub Codespace
Gitpod
Harbor
Prometheus
Grafana
Zipkin
Keycloak
Prisma
Dynatrace
HOW TO FIND THE RIGHT APPROACH?
Way to go
Follow your tummy
Use data & metrics
“Best of class”
HOW TO FIND THE RIGHT APPROACH?
Way to go
Follow your tummy
Use data & metrics
“Best guess”
GOLDEN PATH KEY INDICATORS
Frequency of deployments in %
Development time incl. waiting time & error/bug fixing in h
Operational effort in h
Alignment and communication effort in h
Failure rate in %
v
v
v
v
v
DEFINE GOLDEN PATH
Collect data on certain observation points
DATA EXAMPLE TO FIND GOLDEN PATH
Dev Steps | Frequency % | Dev Time h | Ops Effort h | Alignment/ Comms h | Failure Rate % |
Add/Update Services | 40% | 16 | 8 | 2 | 30% |
Add/Update Resources | 20% | 8 | 24 | 3 | 6% |
Add/Update Configurations | 60% | 1 | 1 | 0 | 10% |
Change architecture | 3% | 60 | 20 | 10 | 60% |
Create new environments | 4% | 24 | 24 | 3 | 25% |
Onboarding new developers | 20% | 80 | 16 | 4 | 80% |
Roll back of failed deployments | 17% | 10 | 20 | 2 | 90% |
Debugging & error tracing | 45% | 10 | 16 | 4 | 5% |
Blocked environments | 8% | 16 | 1 | 2 | 4% |
Waiting for other teams | 16% | 16 | 36 | 5 | 30% |
Source: platformengineering.org
DATA EXAMPLE TO FIND GOLDEN PATH
Dev Steps | Frequency % | Dev Time h | Ops Effort h | Alignment/ Comms h | Failure Rate % |
Add/Update Services | 40% | 16 | 8 | 2 | 30% |
Add/Update Resources | 20% | 8 | 24 | 3 | 6% |
Add/Update Configurations | 60% | 1 | 1 | 0 | 10% |
Change architecture | 3% | 60 | 20 | 10 | 60% |
Create new environments | 4% | 24 | 24 | 3 | 25% |
Onboarding new developers | 20% | 80 | 16 | 4 | 80% |
Roll back of failed deployments | 17% | 10 | 20 | 2 | 90% |
Debugging & error tracing | 45% | 10 | 16 | 4 | 5% |
Blocked environments | 8% | 16 | 1 | 2 | 4% |
Waiting for other teams | 16% | 16 | 36 | 5 | 30% |
Worst 3 per KPI
Best 3 per KPI
Frequency
Failure
Effort
IDENTIFYING THE STARTING POINT
Frequency
10 PRs per year
Failure
Effort
CASE 1
DATA EXAMPLE TO FIND GOLDEN PATH
Dev Steps | Frequency % | Dev Time h | Ops Effort h | Alignment/ Comms h | Failure Rate % |
Add/Update Services | 40% | 16 | 8 | 2 | 30% |
Add/Update Resources | 20% | 8 | 24 | 3 | 6% |
Add/Update Configurations | 60% | 1 | 1 | 0 | 10% |
Change architecture | 3% | 60 | 20 | 10 | 60% |
Create new environments | 4% | 24 | 24 | 3 | 25% |
Onboarding new developers | 20% | 80 | 16 | 4 | 80% |
Roll back of failed deployments | 17% | 10 | 20 | 2 | 90% |
Debugging & error tracing | 45% | 10 | 16 | 4 | 5% |
Blocked environments | 8% | 16 | 1 | 2 | 4% |
Waiting for other teams | 16% | 16 | 36 | 5 | 30% |
Worst 3 per KPI
Best 3 per KPI
Golden Path
Frequency
6 PRs per month
Failure
Effort
CASE 2
DATA EXAMPLE TO FIND GOLDEN PATH
Dev Steps | Frequency % | Dev Time h | Ops Effort h | Alignment/ Comms h | Failure Rate % |
Add/Update Services | 40% | 16 | 8 | 2 | 30% |
Add/Update Resources | 20% | 8 | 24 | 3 | 6% |
Add/Update Configurations | 60% | 1 | 1 | 0 | 10% |
Change architecture | 3% | 60 | 20 | 10 | 60% |
Create new environments | 4% | 24 | 24 | 3 | 25% |
Onboarding new developers | 20% | 80 | 16 | 4 | 80% |
Roll back of failed deployments | 17% | 10 | 20 | 2 | 90% |
Debugging & error tracing | 45% | 10 | 16 | 4 | 5% |
Blocked environments | 8% | 16 | 1 | 2 | 4% |
Waiting for other teams | 16% | 16 | 36 | 5 | 30% |
Worst 3 per KPI
Best 3 per KPI
Golden Path
Frequency
20 PRs per month
Failure
Effort
CASE 3
DATA EXAMPLE TO FIND GOLDEN PATH
Dev Steps | Frequency % | Dev Time h | Ops Effort h | Alignment/ Comms h | Failure Rate % |
Add/Update Services | 40% | 16 | 8 | 2 | 30% |
Add/Update Resources | 20% | 8 | 24 | 3 | 6% |
Add/Update Configurations | 60% | 1 | 1 | 0 | 10% |
Change architecture | 3% | 60 | 20 | 10 | 60% |
Create new environments | 4% | 24 | 24 | 3 | 25% |
Onboarding new developers | 20% | 80 | 16 | 4 | 80% |
Roll back of failed deployments | 17% | 10 | 20 | 2 | 90% |
Debugging & error tracing | 45% | 10 | 16 | 4 | 5% |
Blocked environments | 8% | 16 | 1 | 2 | 4% |
Waiting for other teams | 16% | 16 | 36 | 5 | 30% |
Worst 3 per KPI
Best 3 per KPI
Golden Path
RECAPITULATE THE STEPS
Evaluate your current situation (facts & figures)
Decide for the golden path
Define its primary purpose
Prioritize implementations
1.
2.
3.
4.
COMMUNITIES & RESPONSIBILITIES
PLATFORMS LIVE BY TWO VALUES
Platforms as a Product not as a Project
Community
Platform Teams have to learn that they also haven't eaten StackOverflow with a golden spoon.
Open your platforms for contributions, open discussions and 2nd round of input.
Product Owner can gain input from many sources.
Responsibilities
Platforms require a long term responsibility and accountability -> become a product.
Those responsibilities needs to be clearly defined, to be able to act as thought leader.
But, responsibilities can be shared.
ELSE, YOU WILL FAIL:
Technical “Play Ground”
Rename your DevOps Team to Platform Team
Wrong focus on Dev or Ops
Your org create PE silos (like DevOps)
Missing mindset shift to platform as a product
INTERNAL DEVELOPER PLATFORMS
“An Internal Developer Platform (IDP) is a specialized environment or set of tools and services designed to streamline and enhance the software development process within an organization.”
IDP CORE ELEMENTS
ROLE-BASED ACCESS CONTROL
Manage access on a granular level
APPLICATION CONFIGURATION MANAGEMENT
Scope, Versioning, Portability, and Secret Managment
INFRASTRUCTURE ORCHESTRATION
IaC, CI/CD, DNS, Clusters, and other reosurces
ENVIRONMENT MANAGEMENT
Self-serve fully provisioned environments on demand
DEPLOYMENT MANAGEMENT
Continuous Deployment (CD)
WHY AN IDP IS IMPORTANT?
INTERNAL DEVELOPER PLATFORM
PRODUCTVITY
STANDIZATION
COLLABORATION
SCALABILITY
SELF-SERVICE
GOVERNANCE
PLATFORM AS� A PRODUCT?
TREAT YOUR PLATFORM AS A PRODUCT
Customer Centric
A product for the developers
Tailor-made
Carefully designed and curated
Simplicity
Simplify some workflow, abstract some details, and provide easier user interfaces
Continual Evolution
Take advantage of technology changes
THANK YOU!
Max Körbächer
Founder & Cloud Native Advisor
Happy to connect on LinkedIn
Hossein Salahi
Principal Platform Engineer
Happy to connect on LinkedIn
Icons: https://www.flaticon.com/ created by smashingstocks