1 of 49

PLATFORM ENGINEERING AND THE FUTURE OF INTERNAL PLATFORM PRODUCTS

Hossein Salahi | Lead Platform Engineer

Max Körbächer | Founder & Cloud Native Advisor

2 of 49

WHERE DEVOPS FAILS?

3 of 49

WHERE DEVOPS FAILS?

Core Principles and Promises of DevOps

Collaboration

Automation

Continuous 

Improvements

Observability

Feedback Loops

Break Down Silos

Foster a communication 

Culture

Cross-Functional Teams

Organizational-wide engagement

Centrally collect and react on events, provide guard rails and intrusion detection

Infrastructure as Code

Continuous Integration

Continuous Delivery 

Real-time Monitoring

Both application and infrastructure

Detect and address proactively

A communication channel between ops and dev

Analyze success and failure of deployments

4 of 49

WHERE DEVOPS FAILS?

Core Reasons DevOps Initiatives Fail

Lack of vision

Unrealistic 

Expectations

   Building DevOps Team

Adaption 

Approaches

Maintaining Old Structures

Unclear strategic planning and business value

Desired results in terms of time and resources

Elimination of bottlenecks

Breaking down silos

But, yet another silo!

Failing to apply it organization wide

A Faster build, test, and release process

A Secure infrastructure

Setting multi-cloud strategies 

without understanding the problem

A cultural shift

A mindset rather than tools, speed, or 

applications

Cultural pushback

Building a hybrid structure

Keeping ops and dev in their silos

5 of 49

https://www.innoq.com/en/articles/2023/08/is-platform-engineering-the-new-devops/#discoverabilityandownership

6 of 49

WHY DEVOPS FAILS?

Why may development teams not adopt it?

  • They are probably slow
  • Not everybody dream to take end-to-end responsibility!
  • Too many new tools and they are solving problems
  • They were not part of service conceptualization
  • Service discovery and ownership
  • Container platform is not the best abstraction layer

7 of 49

PLATFORM�ENGINEERING

Definition

A discipline of designing and building toolchains and workflows

A layer abstraction to simplify underlying infrastructure provisioning and management

A standardized, automated, scalable environments

More efficient application delivery

Self-service capabilities for development teams

8 of 49

https://www.innoq.com/en/articles/2023/08/is-platform-engineering-the-new-devops/#discoverabilityandownership

9 of 49

https://blog.joshgav.com/posts/kubecon-platforms-review

10 of 49

PLATFORM ENGINEERING RESPONSIBILITIES

Self-Service Capabilities

Observability

Collaboration and Feedback

Standardization

Security and Compliance

Infrastructure Automation

Scalability and Resilience

Workflow Orchestration

11 of 49

CRAFT YOUR OWN PLATFORMS

12 of 49

“BEFORE YOU CAN BUILD A PLATFORM YOU NEED TO KNOW WHAT WILL BE ITS PURPOSE”

13 of 49

PLATFORM LAYERS

Consist of multiple components that interact in different speed with each other

Developer Control Plane

Integration & Delivery Plane

Security & Compliance Plane

Observability & Operability Plane

Resource Plane

Consist of any Dev (Software / Platform) required tools for coding & versioning

Usually a cloud provider, IaaS or hypervisor

Centrally collect and react on events, provide guard rails and intrusion detection

Tooling provided to operate the platform, and in some cases also the application

Features which the platform provides for the users in lives “within” the platform

Capability Plane

Build, test & deploy software and components

14 of 49

IDP REFERENCE

14

Developer Control Plane

IDE

Service Catalog / API Catalog Developer Portal

Version control

Application Source Code

Platform Source Code

Buildpacks

Terraform

GitHub

Workloads

Automations

Backstage

VSCode

GitHub Codespace

Gitpod

15 of 49

IDP REFERENCE

15

Integration and

Delivery Plane

CI Pipeline

Registry

CD Pipeline

GitHub�Actions

Amazon�ECR

Argo CD

GitHub Actions

Harbor

16 of 49

IDP REFERENCE

16

Observability & Operations Plane

Observability

Prometheus

Grafana

Zipkin

Dynatrace

17 of 49

IDP REFERENCE

17

Security & Compliance Plane

Secrets & Identity Manager

HCP Vault

Keycloak

Prisma

18 of 49

IDP REFERENCE

18

Resource Plane

Compute

Data

Networking

Services

Amazon�EKS

RSD�MySQL

Route 53

Amazon�SQS

19 of 49

IDP REFERENCE

19

Developer Control Plane

Integration and

Delivery Plane

Observability & Operations Plane

Security & Compliance Plane

IDE

Service Catalog / API Catalog Developer Portal

Version control

Application Source Code

Platform Source Code

Observability

Secrets & Identity Manager

CI Pipeline

Registry

CD Pipeline

Resource Plane

Compute

Data

Networking

Services

Buildpacks

Terraform

GitHub

GitHub�Actions

Amazon�ECR

HCP Vault

Amazon�EKS

RSD�MySQL

Route 53

Amazon�SQS

Workloads

Automations

Backstage

Argo CD

GitHub Actions

VSCode

GitHub Codespace

Gitpod

Harbor

Prometheus

Grafana

Zipkin

Keycloak

Prisma

Dynatrace

Source: platformengineering.org

20 of 49

IDP REFERENCE

20

Capability Plane

Resource Integration

Security & Compliance

Observability

User Space

Prometheus

Scale & Schedule

Network

GPU Device Plugin

Heavy Babies

21 of 49

21

Capability Plane

Resource Integration

Security & Compliance

Observability

User Space

Prometheus

Scale & Schedule

Network

GPU Device Plugin

Heavy Babies

22 of 49

INFLUENCING

Hardware and software effects eachother

Resource Availability

Startup

- Virtualization

Portability

-

Container

App Scaleability

Infrastructure

Cloud Native Application

23 of 49

INFLUENCING

Hardware and software effects eachother

Resource Availability

Startup

- Virtualization

Portability

-

Container

App Scaleability

Infrastructure

Cloud Native Application

PLATFORMS NEEDS TO PROVIDE THE CAPABILITIES TO ADJUST TO ANY WORKLOAD WITHOUT HARMING THE UNDERLAYING INFRASTRUCTURE

24 of 49

IDP REFERENCE

24

Developer Control Plane

Integration and

Delivery Plane

Observability & Operations Plane

Security & Compliance Plane

IDE

Service Catalog / API Catalog Developer Portal

Version control

Application Source Code

Platform Source Code

Observability

Secrets & Identity Manager

CI Pipeline

Registry

CD Pipeline

Resource Plane

Compute

Data

Networking

Services

Buildpacks

Terraform

GitHub

GitHub�Actions

Amazon�ECR

HCP Vault

Amazon�EKS

RSD�MySQL

Route 53

Amazon�SQS

Workloads

Automations

Backstage

Argo CD

GitHub Actions

VSCode

GitHub Codespace

Gitpod

Harbor

Prometheus

Grafana

Zipkin

Keycloak

Prisma

Dynatrace

25 of 49

HOW TO FIND THE RIGHT APPROACH?

Way to go

Follow your tummy

Use data & metrics

“Best of class”

26 of 49

HOW TO FIND THE RIGHT APPROACH?

Way to go

Follow your tummy

Use data & metrics

“Best guess”

27 of 49

GOLDEN PATH KEY INDICATORS

Frequency of deployments in %

Development time incl. waiting time & error/bug fixing in h

Operational effort in h

Alignment and communication effort in h

Failure rate in %

v

v

v

v

v

28 of 49

DEFINE GOLDEN PATH

Collect data on certain observation points

  • Add/Update
    • Services
    • Resources
    • Configurations
  • Change architecture
  • Create new environments
  • Onboarding new developers
  • Roll back of failed deployments
  • Debugging & error tracing
  • Blocked environments
  • Waiting for other teams

29 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps

Frequency %

Dev Time h

Ops Effort h

Alignment/ Comms h

Failure Rate %

Add/Update Services

40%

16

8

2

30%

Add/Update Resources

20%

8

24

3

6%

Add/Update Configurations

60%

1

1

0

10%

Change architecture

3%

60

20

10

60%

Create new environments

4%

24

24

3

25%

Onboarding new developers

20%

80

16

4

80%

Roll back of failed deployments

17%

10

20

2

90%

Debugging & error tracing

45%

10

16

4

5%

Blocked environments

8%

16

1

2

4%

Waiting for other teams

16%

16

36

5

30%

Source: platformengineering.org

30 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps

Frequency %

Dev Time h

Ops Effort h

Alignment/ Comms h

Failure Rate %

Add/Update Services

40%

16

8

2

30%

Add/Update Resources

20%

8

24

3

6%

Add/Update Configurations

60%

1

1

0

10%

Change architecture

3%

60

20

10

60%

Create new environments

4%

24

24

3

25%

Onboarding new developers

20%

80

16

4

80%

Roll back of failed deployments

17%

10

20

2

90%

Debugging & error tracing

45%

10

16

4

5%

Blocked environments

8%

16

1

2

4%

Waiting for other teams

16%

16

36

5

30%

Worst 3 per KPI

Best 3 per KPI

31 of 49

Frequency

Failure

Effort

IDENTIFYING THE STARTING POINT

32 of 49

Frequency

10 PRs per year

Failure

Effort

CASE 1

33 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps

Frequency %

Dev Time h

Ops Effort h

Alignment/ Comms h

Failure Rate %

Add/Update Services

40%

16

8

2

30%

Add/Update Resources

20%

8

24

3

6%

Add/Update Configurations

60%

1

1

0

10%

Change architecture

3%

60

20

10

60%

Create new environments

4%

24

24

3

25%

Onboarding new developers

20%

80

16

4

80%

Roll back of failed deployments

17%

10

20

2

90%

Debugging & error tracing

45%

10

16

4

5%

Blocked environments

8%

16

1

2

4%

Waiting for other teams

16%

16

36

5

30%

Worst 3 per KPI

Best 3 per KPI

Golden Path

34 of 49

Frequency

6 PRs per month

Failure

Effort

CASE 2

35 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps

Frequency %

Dev Time h

Ops Effort h

Alignment/ Comms h

Failure Rate %

Add/Update Services

40%

16

8

2

30%

Add/Update Resources

20%

8

24

3

6%

Add/Update Configurations

60%

1

1

0

10%

Change architecture

3%

60

20

10

60%

Create new environments

4%

24

24

3

25%

Onboarding new developers

20%

80

16

4

80%

Roll back of failed deployments

17%

10

20

2

90%

Debugging & error tracing

45%

10

16

4

5%

Blocked environments

8%

16

1

2

4%

Waiting for other teams

16%

16

36

5

30%

Worst 3 per KPI

Best 3 per KPI

Golden Path

36 of 49

Frequency

20 PRs per month

Failure

Effort

CASE 3

37 of 49

DATA EXAMPLE TO FIND GOLDEN PATH

Dev Steps

Frequency %

Dev Time h

Ops Effort h

Alignment/ Comms h

Failure Rate %

Add/Update Services

40%

16

8

2

30%

Add/Update Resources

20%

8

24

3

6%

Add/Update Configurations

60%

1

1

0

10%

Change architecture

3%

60

20

10

60%

Create new environments

4%

24

24

3

25%

Onboarding new developers

20%

80

16

4

80%

Roll back of failed deployments

17%

10

20

2

90%

Debugging & error tracing

45%

10

16

4

5%

Blocked environments

8%

16

1

2

4%

Waiting for other teams

16%

16

36

5

30%

Worst 3 per KPI

Best 3 per KPI

Golden Path

38 of 49

RECAPITULATE THE STEPS

  • What will it be good for?
  • How does it support your org?
  • Collect the data, don’t let people guess, take at least 2-4 weeks to measure it
  • Define which KPI is your primary driver
  • Think of the paredo optimum
  • Develop a plan which feature should come first, clarify dependencies and ensure it will not have a negative effect on other KPIs

Evaluate your current situation (facts & figures)

Decide for the golden path

Define its primary purpose

Prioritize implementations

1.

2.

3.

4.

39 of 49

COMMUNITIES & RESPONSIBILITIES

40 of 49

PLATFORMS LIVE BY TWO VALUES

Platforms as a Product not as a Project

Community

Platform Teams have to learn that they also haven't eaten StackOverflow with a golden spoon.

Open your platforms for contributions, open discussions and 2nd round of input.

Product Owner can gain input from many sources.

Responsibilities

Platforms require a long term responsibility and accountability -> become a product.

Those responsibilities needs to be clearly defined, to be able to act as thought leader.

But, responsibilities can be shared.

41 of 49

ELSE, YOU WILL FAIL:

Technical “Play Ground”

  • Replace old technologies with new (alpha/beta) for no reason or “just be early adopter”
  • Which leads often to a setup that is “special”
  • Or build portals and features literally no one asked for
  • This usually causes hidden and sunken costs

Rename your DevOps Team to Platform Team

Wrong focus on Dev or Ops

Your org create PE silos (like DevOps)

Missing mindset shift to platform as a product

42 of 49

INTERNAL DEVELOPER PLATFORMS

43 of 49

“An Internal Developer Platform (IDP) is a specialized environment or set of tools and services designed to streamline and enhance the software development process within an organization.”

44 of 49

IDP CORE ELEMENTS

ROLE-BASED ACCESS CONTROL

Manage access on a granular level

APPLICATION CONFIGURATION MANAGEMENT

Scope, Versioning, Portability, and Secret Managment

INFRASTRUCTURE ORCHESTRATION

IaC, CI/CD, DNS, Clusters, and other reosurces 

ENVIRONMENT MANAGEMENT

Self-serve fully provisioned environments on demand

DEPLOYMENT MANAGEMENT

Continuous Deployment (CD)

45 of 49

WHY AN IDP IS IMPORTANT?

INTERNAL DEVELOPER PLATFORM

PRODUCTVITY

  • Automates setting up and managing development environments building pipelines, and application delivery

STANDIZATION

  • A standard set of tools and services
  • Reduces risk of inconsisties and errors

COLLABORATION

  • A shared platform to collaborate between teams such operations and security 

SCALABILITY

  • Provide a scalable platform that grow with organization 
  • Ensuring service quality for new teams 

SELF-SERVICE

  • Reduce new dev team on boarding time and operational complexities

GOVERNANCE

  • A framework that enables adherence to best practices that complies with security and complIance

46 of 49

PLATFORM AS� A PRODUCT?

47 of 49

TREAT YOUR PLATFORM AS A PRODUCT

Customer Centric

A product for the developers

Tailor-made

Carefully designed and curated

Simplicity

Simplify some workflow, abstract some details, and provide easier user interfaces

Continual Evolution

Take advantage of technology changes 

48 of 49

THANK YOU!

Max Körbächer

Founder & Cloud Native Advisor

Happy to connect on LinkedIn

Hossein Salahi

Principal Platform Engineer

       Happy to connect on LinkedIn

49 of 49

Icons: https://www.flaticon.com/ created by smashingstocks