3 of 49

WHERE DEVOPS FAILS?

Core Principles and Promises of DevOps

Collaboration

Automation

Continuous

Improvements

Observability

Feedback Loops

Break Down Silos

Foster a communication

Culture

Cross-Functional Teams

Organizational-wide engagement

Centrally collect and react on events, provide guard rails and intrusion detection

Infrastructure as Code

Continuous Integration

Continuous Delivery

Real-time Monitoring

Both application and infrastructure

Detect and address proactively

A communication channel between ops and dev

Analyze success and failure of deployments

4 of 49

WHERE DEVOPS FAILS?

Core Reasons DevOps Initiatives Fail

Lack of vision

Unrealistic

Expectations

Building DevOps Team

Adaption

Approaches

Maintaining Old Structures

Unclear strategic planning and business value

Desired results in terms of time and resources

Elimination of bottlenecks

Breaking down silos

But, yet another silo!

Failing to apply it organization wide

A Faster build, test, and release process

A Secure infrastructure

Setting multi-cloud strategies

without understanding the problem

A cultural shift

A mindset rather than tools, speed, or

applications

Cultural pushback

Building a hybrid structure

Keeping ops and dev in their silos

5 of 49

https://www.innoq.com/en/articles/2023/08/is-platform-engineering-the-new-devops/#discoverabilityandownership

6 of 49

WHY DEVOPS FAILS?

Why may development teams not adopt it?

They are probably slow
Not everybody dream to take end-to-end responsibility!
Too many new tools and they are solving problems
They were not part of service conceptualization
Service discovery and ownership
Container platform is not the best abstraction layer

7 of 49

PLATFORM�ENGINEERING

Definition

A discipline of designing and building toolchains and workflows

A layer abstraction to simplify underlying infrastructure provisioning and management

A standardized, automated, scalable environments

More efficient application delivery

Self-service capabilities for development teams

8 of 49

https://www.innoq.com/en/articles/2023/08/is-platform-engineering-the-new-devops/#discoverabilityandownership

9 of 49

https://blog.joshgav.com/posts/kubecon-platforms-review

10 of 49

PLATFORM ENGINEERING RESPONSIBILITIES

Self-Service Capabilities

Observability

Collaboration and Feedback

Standardization

Security and Compliance

Infrastructure Automation

Scalability and Resilience

Workflow Orchestration

11 of 49

CRAFT YOUR OWN PLATFORMS

12 of 49

“BEFORE YOU CAN BUILD A PLATFORM YOU NEED TO KNOW WHAT WILL BE ITS PURPOSE”

13 of 49

PLATFORM LAYERS

Consist of multiple components that interact in different speed with each other

Developer Control Plane

Integration & Delivery Plane

Security & Compliance Plane

Observability & Operability Plane

Resource Plane

Consist of any Dev (Software / Platform) required tools for coding & versioning

Usually a cloud provider, IaaS or hypervisor

Centrally collect and react on events, provide guard rails and intrusion detection

Tooling provided to operate the platform, and in some cases also the application

Features which the platform provides for the users in lives “within” the platform

Capability Plane

Build, test & deploy software and components

14 of 49

IDP REFERENCE

Developer Control Plane

IDE

Service Catalog / API Catalog Developer Portal

Version control

Application Source Code

Platform Source Code

Buildpacks

Terraform

GitHub

Workloads

Automations

Backstage

VSCode

GitHub Codespace

Gitpod

15 of 49

IDP REFERENCE

Integration and

Delivery Plane

CI Pipeline

Registry

CD Pipeline

GitHub�Actions

Amazon�ECR

Argo CD

GitHub Actions

Harbor

16 of 49

IDP REFERENCE

Observability & Operations Plane

Observability

Prometheus

Grafana

Zipkin

Dynatrace

17 of 49

IDP REFERENCE

Security & Compliance Plane

Secrets & Identity Manager

HCP Vault

Keycloak

Prisma

18 of 49

IDP REFERENCE

Resource Plane

Compute

Data

Networking

Services

Amazon�EKS

RSD�MySQL

Route 53

Amazon�SQS

19 of 49

IDP REFERENCE

Developer Control Plane

Integration and

Delivery Plane

Observability & Operations Plane

Security & Compliance Plane

IDE

Service Catalog / API Catalog Developer Portal

Version control

Application Source Code

Platform Source Code

Observability

Secrets & Identity Manager

CI Pipeline

Registry

CD Pipeline

Resource Plane

Compute

Data

Networking

Services

Buildpacks

Terraform

GitHub

GitHub�Actions

Amazon�ECR

HCP Vault

Amazon�EKS

RSD�MySQL

Route 53

Amazon�SQS

Workloads

Automations

Backstage

Argo CD

GitHub Actions

VSCode

GitHub Codespace

Gitpod

Harbor

Prometheus

Grafana

Zipkin

Keycloak

Prisma

Dynatrace

Source: platformengineering.org

20 of 49

IDP REFERENCE

Capability Plane

Resource Integration

Security & Compliance

Observability

User Space

Prometheus

Scale & Schedule

Network

GPU Device Plugin

Heavy Babies

21 of 49

Capability Plane

Resource Integration

Security & Compliance

Observability

User Space

Prometheus

Scale & Schedule

Network

GPU Device Plugin

Heavy Babies

22 of 49

INFLUENCING

Hardware and software effects eachother

Resource Availability

Startup

- Virtualization

Portability

Container

App Scaleability

Infrastructure

Cloud Native Application

23 of 49

INFLUENCING

Hardware and software effects eachother

Resource Availability

Startup

- Virtualization

Portability

Container

App Scaleability

Infrastructure

Cloud Native Application

PLATFORMS NEEDS TO PROVIDE THE CAPABILITIES TO ADJUST TO ANY WORKLOAD WITHOUT HARMING THE UNDERLAYING INFRASTRUCTURE

24 of 49

IDP REFERENCE

Developer Control Plane

Integration and

Delivery Plane

Observability & Operations Plane

Security & Compliance Plane

IDE

Service Catalog / API Catalog Developer Portal

Version control

Application Source Code

Platform Source Code

Observability

Secrets & Identity Manager

CI Pipeline

Registry

CD Pipeline

Resource Plane

Compute

Data

Networking

Services

Buildpacks

Terraform

GitHub

GitHub�Actions

Amazon�ECR

HCP Vault

Amazon�EKS

RSD�MySQL

Route 53

Amazon�SQS

Workloads

Automations

Backstage

Argo CD

GitHub Actions

VSCode

GitHub Codespace

Gitpod

Harbor

Prometheus

Grafana

Zipkin

Keycloak

Prisma

Dynatrace

25 of 49

HOW TO FIND THE RIGHT APPROACH?

Way to go

Follow your tummy

Use data & metrics

“Best of class”

26 of 49

HOW TO FIND THE RIGHT APPROACH?

Way to go

Follow your tummy

Use data & metrics

“Best guess”

27 of 49

GOLDEN PATH KEY INDICATORS

Frequency of deployments in %

Development time incl. waiting time & error/bug fixing in h

Operational effort in h

Alignment and communication effort in h

Failure rate in %

28 of 49

DEFINE GOLDEN PATH

Collect data on certain observation points

Add/Update

Services
Resources
Configurations

Change architecture
Create new environments
Onboarding new developers
Roll back of failed deployments
Debugging & error tracing
Blocked environments
Waiting for other teams

29 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps	Frequency %	Dev Time h	Ops Effort h	Alignment/ Comms h	Failure Rate %
Add/Update Services	40%	16	8	2	30%
Add/Update Resources	20%	8	24	3	6%
Add/Update Configurations	60%	1	1	0	10%
Change architecture	3%	60	20	10	60%
Create new environments	4%	24	24	3	25%
Onboarding new developers	20%	80	16	4	80%
Roll back of failed deployments	17%	10	20	2	90%
Debugging & error tracing	45%	10	16	4	5%
Blocked environments	8%	16	1	2	4%
Waiting for other teams	16%	16	36	5	30%

Source: platformengineering.org

30 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps	Frequency %	Dev Time h	Ops Effort h	Alignment/ Comms h	Failure Rate %
Add/Update Services	40%	16	8	2	30%
Add/Update Resources	20%	8	24	3	6%
Add/Update Configurations	60%	1	1	0	10%
Change architecture	3%	60	20	10	60%
Create new environments	4%	24	24	3	25%
Onboarding new developers	20%	80	16	4	80%
Roll back of failed deployments	17%	10	20	2	90%
Debugging & error tracing	45%	10	16	4	5%
Blocked environments	8%	16	1	2	4%
Waiting for other teams	16%	16	36	5	30%

Worst 3 per KPI

Best 3 per KPI

31 of 49

Frequency

Failure

Effort

IDENTIFYING THE STARTING POINT

32 of 49

Frequency

10 PRs per year

Failure

Effort

CASE 1

33 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps	Frequency %	Dev Time h	Ops Effort h	Alignment/ Comms h	Failure Rate %
Add/Update Services	40%	16	8	2	30%
Add/Update Resources	20%	8	24	3	6%
Add/Update Configurations	60%	1	1	0	10%
Change architecture	3%	60	20	10	60%
Create new environments	4%	24	24	3	25%
Onboarding new developers	20%	80	16	4	80%
Roll back of failed deployments	17%	10	20	2	90%
Debugging & error tracing	45%	10	16	4	5%
Blocked environments	8%	16	1	2	4%
Waiting for other teams	16%	16	36	5	30%

Worst 3 per KPI

Best 3 per KPI

Golden Path

34 of 49

Frequency

6 PRs per month

Failure

Effort

CASE 2

35 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps	Frequency %	Dev Time h	Ops Effort h	Alignment/ Comms h	Failure Rate %
Add/Update Services	40%	16	8	2	30%
Add/Update Resources	20%	8	24	3	6%
Add/Update Configurations	60%	1	1	0	10%
Change architecture	3%	60	20	10	60%
Create new environments	4%	24	24	3	25%
Onboarding new developers	20%	80	16	4	80%
Roll back of failed deployments	17%	10	20	2	90%
Debugging & error tracing	45%	10	16	4	5%
Blocked environments	8%	16	1	2	4%
Waiting for other teams	16%	16	36	5	30%

Worst 3 per KPI

Best 3 per KPI

Golden Path

36 of 49

Frequency

20 PRs per month

Failure

Effort

CASE 3

37 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps	Frequency %	Dev Time h	Ops Effort h	Alignment/ Comms h	Failure Rate %
Add/Update Services	40%	16	8	2	30%
Add/Update Resources	20%	8	24	3	6%
Add/Update Configurations	60%	1	1	0	10%
Change architecture	3%	60	20	10	60%
Create new environments	4%	24	24	3	25%
Onboarding new developers	20%	80	16	4	80%
Roll back of failed deployments	17%	10	20	2	90%
Debugging & error tracing	45%	10	16	4	5%
Blocked environments	8%	16	1	2	4%
Waiting for other teams	16%	16	36	5	30%

Worst 3 per KPI

Best 3 per KPI

Golden Path

38 of 49

RECAPITULATE THE STEPS

What will it be good for?
How does it support your org?

Collect the data, don’t let people guess, take at least 2-4 weeks to measure it

Define which KPI is your primary driver
Think of the paredo optimum

Develop a plan which feature should come first, clarify dependencies and ensure it will not have a negative effect on other KPIs

Evaluate your current situation (facts & figures)

Decide for the golden path

Define its primary purpose

Prioritize implementations

39 of 49

COMMUNITIES & RESPONSIBILITIES

40 of 49

PLATFORMS LIVE BY TWO VALUES

Platforms as a Product not as a Project

Community

Platform Teams have to learn that they also haven't eaten StackOverflow with a golden spoon.

Open your platforms for contributions, open discussions and 2^nd round of input.

Product Owner can gain input from many sources.

Responsibilities

Platforms require a long term responsibility and accountability -> become a product.

Those responsibilities needs to be clearly defined, to be able to act as thought leader.

But, responsibilities can be shared.

41 of 49

ELSE, YOU WILL FAIL:

Technical “Play Ground”

Replace old technologies with new (alpha/beta) for no reason or “just be early adopter”
Which leads often to a setup that is “special”
Or build portals and features literally no one asked for
This usually causes hidden and sunken costs

Rename your DevOps Team to Platform Team

Wrong focus on Dev or Ops

Your org create PE silos (like DevOps)

Missing mindset shift to platform as a product

42 of 49

INTERNAL DEVELOPER PLATFORMS

43 of 49

“An Internal Developer Platform (IDP) is a specialized environment or set of tools and services designed to streamline and enhance the software development process within an organization.”

44 of 49

IDP CORE ELEMENTS

ROLE-BASED ACCESS CONTROL

Manage access on a granular level

APPLICATION CONFIGURATION MANAGEMENT

Scope, Versioning, Portability, and Secret Managment

INFRASTRUCTURE ORCHESTRATION

IaC, CI/CD, DNS, Clusters, and other reosurces

ENVIRONMENT MANAGEMENT

Self-serve fully provisioned environments on demand

DEPLOYMENT MANAGEMENT

Continuous Deployment (CD)

45 of 49

WHY AN IDP IS IMPORTANT?

INTERNAL DEVELOPER PLATFORM

PRODUCTVITY

Automates setting up and managing development environments building pipelines, and application delivery

STANDIZATION

A standard set of tools and services
Reduces risk of inconsisties and errors

COLLABORATION

A shared platform to collaborate between teams such operations and security

SCALABILITY

Provide a scalable platform that grow with organization
Ensuring service quality for new teams

SELF-SERVICE

Reduce new dev team on boarding time and operational complexities

GOVERNANCE

A framework that enables adherence to best practices that complies with security and complIance

46 of 49

PLATFORM AS� A PRODUCT?

47 of 49

TREAT YOUR PLATFORM AS A PRODUCT

Customer Centric

A product for the developers

Tailor-made

Carefully designed and curated

Simplicity

Simplify some workflow, abstract some details, and provide easier user interfaces

Continual Evolution

Take advantage of technology changes

48 of 49

THANK YOU!

Max Körbächer

Founder & Cloud Native Advisor

Happy to connect on LinkedIn

Hossein Salahi

Principal Platform Engineer

Happy to connect on LinkedIn

1 of 49

2 of 49

3 of 49

4 of 49

5 of 49

6 of 49

7 of 49

8 of 49

9 of 49

10 of 49

11 of 49

12 of 49

13 of 49

14 of 49

15 of 49

16 of 49

17 of 49

18 of 49

19 of 49

20 of 49

21 of 49

22 of 49

23 of 49

24 of 49

25 of 49

26 of 49

27 of 49

28 of 49

29 of 49

30 of 49

31 of 49

32 of 49

33 of 49

34 of 49

35 of 49

36 of 49

37 of 49

38 of 49

39 of 49

40 of 49

41 of 49

42 of 49

43 of 49

44 of 49

45 of 49

46 of 49

47 of 49

48 of 49

49 of 49