1 of 25

Instance Insights:

Key Metrics for Customer-Hosted Applications

2 of 25

Install Success Rate

  • What fraction of customers installation attempts are successful
  • Inverse of attempts per install
  • Key indicator of the quality of packaging, configuration, testing, and documentation of the delivery of a software product

Related to: Trial Conversion Rate, Time to Install, Support Burden

Symptoms

  • High customer churn during POV or initial implementation
  • High support burden for initial installations

Improvements

  • Pre-flight check environments
  • Formalize pre-install research
  • Improve documentation
  • Reduce # supported environments

Best in Class

6 Failures

30 Successes

86%

90%

3 of 25

Time to Install

  • How fast can you get software up and running in a customer environment?
  • How fast can you deliver value and drive feature adoption?
  • Key indicator of the quality of packaging, configuration, testing, and documentation of the delivery of a software product

Related to: Trial Conversion Rate, Install Success Rate, Support Burden

Symptoms

  • High customer churn during POV or initial implementation
  • High support burden for initial installations

Improvements

  • Pre-flight check environments
  • Formalize pre-install research
  • Improve documentation
  • Reduce # supported environments

Best in Class

2hrs

80th percentile

3 days

(80th percentile)

4 of 25

Adoption Rate

  • What fraction of your customers are running a recent version of your software?
  • Goals will be highly dependent on your release cadence.
  • Many use last three versions, but last version or last two versions are also useful

Related to: Age of Deployed Software, Support Burden, Upgrade Success Rate, Deployment Frequency

Symptoms

  • Difficult multi-step upgrades in the field
  • Lack of value perception

Improvements

  • Invest in upgrade stability
  • Proactive outreach
  • Improve new value communication

Best in Class

% on Last Three Versions:

73%

80%

5 of 25

Age of Deployed Software

  • For all running instances, order by age (since publication) of the most recent release, and compute the median and/or mean
  • Absolute age - total time since release publish date
  • Relative age (preferred) - age between a release publish date and the publish date of the latest available release

Related to: Adoption Rate, Support Burden, Release Frequency

Symptoms

  • Difficult multi-step upgrades in the field
  • Lack of value perception

Improvements

  • Invest in upgrade stability
  • Proactive outreach
  • Improve new value communication

Best in Class

Median Deployed Release Age

34 days

60 days

6 of 25

Unique Deployed Versions

  • How many unique versions are your customers running in production?
  • Directly impacts the overhead of maintaining documentation and internal knowledge for supporting different versions

Related to: Adoption Rate, Age of Deployed Software, Support Burden

Symptoms

  • Sprawling, complicated Documentation
  • High support burden
  • Internal stress and burnout

Improvements

  • Drive adoption of newer versions
  • Develop LTS Program

Best in Class

<10

Unique Deployed Versions

5

7 of 25

Deployment Frequency

  • How often are new releases made available to customers?
  • Inspired by DORA, but delivery goals and expectations should be adjusted for customer-hosted applications

Related to: Feature Lag between SaaS and On-Prem, Adoption Rate, Support Burden

Symptoms

  • Long, QA-intensive release cycles
  • Lag in new feature adoption

Improvements

  • Shift left for on-prem - design and test early and often
  • Invest in automated testing for all supported deployment topologies

Best in Class

1x / month

8 of 25

Feature Lag Between SaaS & On-prem

  • How long after a SaaS release do on-prem customers need to wait to gain access to the same feature(s)?
  • Core indicator of product and value velocity

Related to: Deployment Frequency, Upgrade Success Rate

Symptoms

  • Long, QA-intensive release cycles
  • Lag in new feature adoption
  • Split in on-prem / saas skill sets

Improvements

  • Shift left for on-prem - design and test early and often
  • Invest in automated testing for all supported deployment topologies

Best in Class

1 week

Lag Time

4 weeks

9 of 25

Aggregate Uptime

  • The overall sum of uptime divided by the overall lifetime
  • Risk of over-indexing on the performance of longer-lived instances

Related to: Customer/Instance Churn, Upgrade Success Rate, Mean Time to Recover

Symptoms

  • Frequent and protracted support engagements
  • Increased instance or customer churn

Improvements

  • Invest in diagnostic tooling
  • Improve application stability
  • Invest in upgrade success

Best in Class

99%

10 of 25

Instance Uptime

  • The average across each instance’s individual uptime
  • Risk of over-indexing on the performance of shorter-lived instances

Related to: Aggregate Uptime, Mean Time to Recover, Mean Time Between Failures

Symptoms

  • Frequent and protracted support engagements
  • Increased instance or customer churn

Improvements

  • Invest in diagnostic tooling
  • Improve application stability
  • Invest in upgrade success

Best in Class

95%

11 of 25

Upgrade Success Rate

  • What fraction of upgrade attempts complete without downtime and manual intervention?
  • Compute per instance, per release, and/or across all instances of an application
  • Related to inverse of DORA change failure rate - solely focused on vendor-defined changes (customer may also perform changes)

Related to: Adoption Rate, Support Burden

Symptoms

  • High support burden due to failed upgrades
  • Poor Adoption Rates

Improvements

  • Invest in upgrade stability or automatic upgrades
  • Reduce # of supported upgrade paths

Best in Class

99%

12 of 25

Mean Time to Recover

  • On average, how long does it take to recover when a failure occurs?
  • Directly inspired by DORA Mean Time to Recover (MTTR)

Related to: Instance Uptime, Aggregate Uptime, Support Burden

Symptoms

  • Long Support Engagements
  • High Instance Churn

Improvements

  • Improve internal documentation
  • Diagnostic bundling & automated log retrieval

Best in Class

Varies

13 of 25

Mean Time Between Failures

  • On average, how long is an instance up and available between downtime events?
  • Because app changes are less frequent than in SaaS, customer-driven changes or even unattended failures contribute to overall downtime and failures

Related to: Instance Uptime, Aggregate Uptime, Adoption Rate, Age of Deployed Software

Symptoms

  • Customers aren’t upgrading to new versions
  • Unknown failures

Improvements

  • Invest in diagnostic tooling and root cause analysis
  • Improve application stability
  • Improve adoption rate

Best in Class

Varies

14 of 25

Support Burden (hours)

  • Number of support hours over time
  • Reducing the number of hours needed to support customers frees up time to develop new features

Related to: Time to Install, Install Success Rate, Upgrade Success Rate, Uptime, Mean Time to Recover, Deployment Frequency

Symptoms

  • Long install times
  • Low install and upgrade success rates
  • Decreased product velocity

Improvements

  • Invest in diagnostic tooling
  • Improve application stability
  • Invest in upgrade success
  • Improve testing & documentation

Best in Class

1hr / instance / month

15 of 25

Support Burden (cases)

  • Number of support cases over time
  • Reducing the number of cases frees up time to develop new features

Related to: Time to Install, Install Success Rate, Upgrade Success Rate, Uptime, Mean Time to Recover, Deployment Frequency

Symptoms

  • Long install times
  • Low install and upgrade success rates
  • Decreased product velocity

Improvements

  • Invest in diagnostic tooling
  • Improve application stability
  • Invest in upgrade success
  • Improve testing & documentation

Best in Class

0.1 / instance / month

16 of 25

Trial Conversion Rate

  • How many trials convert into paid customers?
  • How many days does it take to progress through installation to a paying customer?
  • Closely related to install success - stalled/failed trials may be caused by poor / broken installation experience

Related to: Time to Install, Install Success Rate, Uptime

Symptoms

  • Poor sales funnel conversion

Improvements

  • Invest in Uptime and Time to Install
  • Explore Go-to-Market and Product-Market-Fit signals

Best in Class

50%+

Days After Signup

17 of 25

Customer/Instance Churn

  • Churn is inverse of Gross Instance Retention - the fraction of instances that were active at the start of a given time period and are still active
  • Generally computed at month/quarter timescale
  • Can explore at instance or customer level

Related to: Adoption Rate, Upgrade Success Rate, Install Success Rate

Symptoms

  • Poor trial conversion rate
  • Poor customer retention

Improvements

  • Invest in upgrade success, adoption, and install success

Best in Class

< 5%+

18 of 25

Appendix - More Graphics

19 of 25

Installation - Cumulative Flow

https://github.com/devopsdays/devopsdays-web/pull/13029/files

20 of 25

Customer Receives Materials

Manual Pre-install checklist

Staging 1:

Hit enter on Command Line

Failed, Discard Server

Customer provisions hardware, networking, external systems

Staging 2:

Hit enter on Command Line

Failed, Discard Server

Staging 3:

Hit enter on Command Line

True TTI - Min

Production 1:

Hit enter on Command Line

True TTI - Max

Instance TTI - Max

Instance TTI - Min

License TTI - Min

License TTI - Max

Software Live

In Staging / POV

Software Live

In production

Software Delivering Value

Total Attempts = 4

Total Success = 2

Attempts/Install = 2

Attempts/First Ready = 3

License �Created

Customer Downloads Assets

Automated Preflight Checks

21 of 25

Reliability - Uptime

App Status: �Ready

App Status:Degraded

App Status: �Ready

App Status:Unavailable

App Status: �Ready

Ready

Degraded

Ready

Unavailable

Ready

State

Events

Uptime: 2d4h

Uptime

Downtime: 18h

Uptime: 12h

Uptime Insights

Uptime: 100%

Uptime: 78%�64hr uptime / 82hr total

22 of 25

Reliability - Aggregates

23 of 25

Reliability - Instance Uptimes

24 of 25

Reliability - Upgrade Success

App Version: 1.2.1

App Status: Ready

App Version: 1.3.0

App Status:

Upgrading

App Status:

Upgrading

App Version: 2.0.2

App Status:

Unavailable

App Status: Ready

1.2.1

1.3.0

2.0.2

Complete

15m healthy:

Success

Unavailable within 15m

Failed

State

Events

Uptime: 2d4h

Uptime

Downtime: 18h

Uptime: 12h

Upgrade Insights

1 Upgrade, 1/1 successful

Success Rate: 100%

Upgrade Status

2 Upgrade, 1/2 successful

Success Rate: 50%

Aggregating Events - upgrades

25 of 25

Revenue - Trial Conversion