1 of 37

Observability for �Engineering Teams

Improving Productivity of Agile Product Engineering Teams

using Flow and Health Metrics

Impact and Outcome OVER Output

Naresh Jain

Xnsio

© 2025 Xnsio All Rights Reserved

2 of 37

Our Goal is to maximise Impact/Outcome AND minimise Output

Src: Jeff Patton

© 2025 Xnsio All Rights Reserved

3 of 37

Operation Successful, Patient Dead!

  • The Safeguard Program was a U.S. Army anti-ballistic missile (ABM) system designed to protect the U.S. Air Force's Minuteman ICBM silos from attack
  • Hardware designed at the same time as software specs being written
  • Late changes in requirements was not an option
  • The project finally delivered according to specifications
  • Duration: 1969-1975, 5407 person years
  • Cost: $7.973 Billion (not adjusted)
  • Project was operational only for 133 days (01-Oct-75 to 10-Feb-76)

‘By the time the 6-year anti-missile system project was completed, the new missiles were faster than the anti-missile missiles’

© 2025 Xnsio All Rights Reserved

4 of 37

Avoiding the “Operation Successful, Patient Dead!” Situation

Even if a project is delivered on time, on budget and as per spec (scope) by fully utilising all the resources, if it is not useful, it is not a productive use of anyone’s time!

Hence from measuring productivity point of view, it is important to focus on consistent, continuous and early delivery of real value to stakeholders

Project Management’s Iron Triangle

© 2025 Xnsio All Rights Reserved

5 of 37

Agile organisation: The new dominant organisational paradigm

Focus on self-organised, cross-functional agile teams

In agile organisations, we measure overall team’s outcome-based performance instead of individual productivity because measuring and rewarding individual performance is deterrent to collaboration and overall team’s performance.

© 2025 Xnsio All Rights Reserved

6 of 37

Building antifragile system in VUCA World

Unlike fragile systems, which dislikes variability and stress, antifragile system gains from disorder/chaos, randomness and VUCA.

One of the key properties of antifragile systems is to localise impact by building in slack/redundancy.

Systems that tries to achieve 100% utilisation ends up becoming fragile and incapable of responding quickly to rapid changes, hence unresponsive and less valuable.

© 2025 Xnsio All Rights Reserved

7 of 37

Shift focus from Utilisation to Throughput and Latency

If we apply queuing theory to product engineering, we understand the importance of reducing the batch size, in order to increase velocity (throughput) and reduce cycle time (latency) while still maintaining an optimal 70-80% utilisation.

© 2025 Xnsio All Rights Reserved

8 of 37

Our goal is to optimise the flow of consistent, continuous and early value delivery to our stakeholders

Hence we use the Flow Metrics to measure productivity of a team

© 2025 Xnsio All Rights Reserved

9 of 37

Flow Metrics

Velocity

Cycle Time

Flow Efficiency

    • WIP Aging

Work-in-Progress (WIP)

© 2025 Xnsio All Rights Reserved

10 of 37

© 2025 Xnsio All Rights Reserved

11 of 37

Analogy to understand Flow Metrics

Velocity (Throughput): How many 10L bottles can I fill in X time?

Cycle Time (Latency): From start to finish, how long will it take to fill this 10L bottle?

Flow Efficiency: How smoothly is the water flowing? Is it continuous flowing or stopping & starting?

WIP (Inventory): How much water is currently in the system?

© 2025 Xnsio All Rights Reserved

12 of 37

Velocity

Velocity is the throughput of a team. It is a measure of how many work items were completed over a period of time. By analyzing trends over time, the metric can tell you if your delivery rates has improved, thus helping you to more accurately estimates/forecasts on how much work (and value) you can deliver in a given period of time.

© 2025 Xnsio All Rights Reserved

13 of 37

Cycle Time

Cycle Time is the latency of a team. It is a measure of the elapsed time each work item takes to be completed from the moment it enters the value stream (approved by business stakeholders) to completion (end user obtaining value from product), including both active and wait states, including weekends and off-hours. By analysing trends over time, the metric can tell you if your acceleration investments are actually improving your time-to-market.

© 2025 Xnsio All Rights Reserved

14 of 37

Flow Efficiency

Flow Efficiency is the proportion of work items actively being worked on. It is measured as total elapsed time in work-centres for all work items divided by total cycle time (time spent in work-centres and waiting-queues.) By analysing trends over time, the metric can tell you if your work items are stagnating in a wait state and slowing down your time-to-market.

© 2025 Xnsio All Rights Reserved

15 of 37

Work in Progress (WIP)

WIP is the measures the number of work items being worked on in a value stream, denoting the amount of WIP. By analysing trends over time, the metric can tell you if your work items are crossing a threshold and negatively effecting the output

© 2025 Xnsio All Rights Reserved

16 of 37

Work in Progress Aging

WIP Aging is the measure of how long a work item has been in the value stream. As the aging increases, it will affect the cycle time.

© 2025 Xnsio All Rights Reserved

17 of 37

Flow Metrics vs. Health Metrics

Flow metrics are influenced by several factors and are relatively slow moving. However, when a new improvement practice is introduced to a team, we use Health Metrics to measure how well the practice is being adopted.

© 2025 Xnsio All Rights Reserved

18 of 37

Code Repository Level Health Metrics Drill Down

© 2025 Xnsio All Rights Reserved

19 of 37

Sonar – Code Quality

© 2025 Xnsio All Rights Reserved

20 of 37

Test Coverage

© 2025 Xnsio All Rights Reserved

21 of 37

CI Builds

© 2025 Xnsio All Rights Reserved

22 of 37

CI Builds

© 2025 Xnsio All Rights Reserved

23 of 37

CI Builds

© 2025 Xnsio All Rights Reserved

24 of 37

Pull Requests

© 2025 Xnsio All Rights Reserved

25 of 37

Source Code Branching

© 2025 Xnsio All Rights Reserved

26 of 37

Accelerate - State of DevOps Report (was DORA)

© 2025 Xnsio All Rights Reserved

27 of 37

A learning loop for continuous improvement

Build

Measure

Learn

Decide next iteration

L

M

Analyze

Go-Live & Observe

B

Intervene

Measure

Learn

Decide next iteration

L

M

Analyze

Let interventions take effect

I

Product Evolution

Process Evolution

(Continuous Improvement)

Now that we’ve a continuous monitoring framework in place, we can prioritize the most impactful area and continuously improve

© 2025 Xnsio All Rights Reserved

28 of 37

Case Study

© 2025 Xnsio All Rights Reserved

29 of 37

© 2025 Xnsio All Rights Reserved

30 of 37

In 2021

In 2023

© 2025 Xnsio All Rights Reserved

31 of 37

31

© 2025 Xnsio All Rights Reserved

32 of 37

32

In 2021

In 2023

© 2025 Xnsio All Rights Reserved

33 of 37

33

© 2025 Xnsio All Rights Reserved

34 of 37

In 2021

In 2023

© 2025 Xnsio All Rights Reserved

35 of 37

35

© 2025 Xnsio All Rights Reserved

36 of 37

36

In 2021

In 2023

© 2025 Xnsio All Rights Reserved

37 of 37

Thank you!

© 2025 Xnsio All Rights Reserved