1 of 28

Let’s Go Agile!�Data-Driven Agile Software Cost and Schedule Models ��

First International Boehm Forum on COCOMO®

and Systems and Software Cost Modeling

November 9-10, 2022

Disclaimer: The contents of this paper reflect the views of the authors and are not necessarily endorsed by the Department of Homeland Security

Wilson Rosa

Sara Jardine

2 of 28

Agenda

2

Introduction

    • Problem Statement
    • Proposed Solution
    • Breakthroughs
    • Benefits to Program Management
    • DHS Agile History

Agile Project Dataset

    • Data Collection
    • Data Sources
    • Variables
    • Data Normalization
    • Demographics
    • Descriptive Statistics

Effort and Schedule Models

    • Benchmarks
    • Effort Estimation Models
    • Schedule Estimation Models
    • Model Rankings

Results

    • Model Usefulness
    • Model Limitations
    • Main Takeaways
    • Work in Progress

3 of 28

What is the Problem?

3

DoD

    • …pilot program to use agile or iterative development methods to tailor major software-intensive warfighting systems and defense business systems. …software development pilot program using agile best practices. – 2018 NDAA Sec. 873/874
    • …description of how the Department will increasingly automate accreditation processes, pursue agile development, incorporate machine learning, and foster reciprocity across authorizing officials. – 2020 NDAA Sec. 1654.b.2.c

DHS

The Department [DHS] needs a credible and accurate method for estimating the cost of software development programs that can be tracked over time and provide insight into whether a program is behind schedule or is forecasted to exceed initial cost projections.”

- Stacy Marcott, Acting Chief Financial Officer, May 30, 2019

Policy mandates the application of agile software development best practices

4 of 28

What is the Solution?

  • Offer a set of data-driven software development effort and schedule estimating models for DHS agile projects

  • Acquisition community can use these models to:
    • More accurately estimate effort and schedule to support DHS and DoD decision reviews of agile programs
    • Crosscheck vendor proposals and evaluate contractor performance

4

5 of 28

Benefits to Program Management

5

FOC = Full Operational Capability

At any stage of the agile program acquisition lifecycle, the PM can choose a sizing measure to estimate the software development cost and schedule

Notes: MNS = Mission Needs Statement

CONOPS = Concept of Operations

RTM = Requirements Traceability Matrix

IOC = Initial Operational Capability

|

|

|

|

6 of 28

Evolution of Agile at DHS

6

2010

OMB issued a 25-point plan to reform IT projects and called on federal agencies to implement shorter delivery timeframe

2016

DHS Agile Development and Delivery for IT Instruction Manual 102-01-004-01

2017

DHS USM directed DHS CAD to find ways to improve agile software development programs [1]

2018

DHS Agile Methodology for Software Development and Delivery for IT, Policy Instruction 102-01-004

2019

DHS CAD launched the cross-agency Joint Agile Software Innovation (JASI) Cost IPT

2021

DHS Agile Guidebook

Collecting Agile data requires an agency data collection policy, close collaboration with PMs, and frequent iterative reassessment

Agile is an iterative approach to deliver solutions incrementally

7 of 28

Study Breakthroughs

  • Delivers first-ever agile software cost dataset (n=18) for DHS cost community
  • Presents a new process for collecting, normalizing, and analyzing agile project cost and schedule data for Firm Fixed Price and Time & Materials contracts
  • Introduces Functional Story as a new sizing measure for agile cost estimation
  • Offers data-driven agile software project effort and schedule benchmarks and regression models for six different sizing measures:

7

Functional Story

Unadjusted Function Point

Simple Function Point

Story

Story Point

Issues

1

2

3

4

5

6

8 of 28

Agile Project Dataset

9 of 28

Data Collection

  • Dataset included 18 agile projects
    • DHS (15) and DoD (3)
    • Across 11 different companies
    • 12 completed last two years
  • 100% data collection efforts occurred between March 2020 to January 2022
  • Data provided by the Program Managers

9

10 of 28

Data Sources

  • All data in this study were provided by the Agile Program Management Offices
  • 100% obtained from Official/Authoritative Documents:

10

Effort

    • Monthly Contractor Invoices
    • Product Backlog

Schedule

    • Monthly Contractor Invoices
    • Product Backlog (in JIRA)

Size

    • Requirements Traceability Matrix
    • Functional Requirements Document
    • Product Backlog (in JIRA)

Context

    • Acquisition Documents
    • Agile Core Metrics

11 of 28

Variable Selection�(Common Sense)

11

Effort  

    • Actual labor hours to complete all contractor development activities
    • Reported at the release level

Schedule

    • Actual development time (months) to complete all software development activities
    • Reported at the release level

Functional Story

    • Subset of functional requirements describing what the software does in terms of tasks and services

Issue 

    • Unit of work traced through a workflow, from creation to completion
    • Total issues are the sum of stories, bugs, tasks, epics, and others 

Story

    • Feature or unit of business value that can be estimated and tested
    • Describes work that must be done to deliver a feature for a product

Story Point  

    • Unit of measure to express the overall size of a story, task, or other piece of work in the backlog

Unadj. Function Point

    • Function point count without the assignment of complexity to any of the objects counted

Simple Function Point

    • Method for sizing software requiring the identification of elementary processes and logic files to approximate a function point count

Scope

    • Categorical variable indicating whether the scope of project is an enhancement or full development

Dependent

Independent

Categorical

12 of 28

Data Normalization:�How did we measure effort?

  • Effort hours in this study captures total labor incurred by the contractor’s agile development teams
  • Total labor includes 11 cost elements aligned to the DHS IT Work Breakdown Structure (WBS)

12

ID

DHS IT WBS Element

1.i.1

Program Management

1.i.2

Systems Engineering

1.i.4.2

Software Development

1.i.4.3

Data Development & Transition

1.i.4.5

Training Development

1.i.4.6.1

Development Test & Evaluation

1.i.4.6.1

Cybersecurity Test & Evaluation

1.i.4.7

Logistics Support Development

1.i.7

System Level Integration & Test

1.i.8.6.1

Help Desk/Service Desk (Tier 3)

1.i.8.6.4

Software Maintenance

Reporting labor at the total level (as opposed to software development alone), is recommended since most DHS agile development contracts are FFP or T&M, and generally do not breakout effort by major cost elements as seen in traditional cost-plus contracts

Why use total labor?

13 of 28

Data Normalization:�Counting Functional Story, SiFP, UFP

13

Extract Product Backlog (JIRA)

    • Go to column titled, Issue Status and filter by rows marked as Done

Find Story

    • Go to column titled, Issue Type and the rows marked as story in this column

Find and Count Functional Story

    • Categorize each story as functional or non-functional*
    • Count rows marked as functional **

Calculate Function Point

    • Convert each functional story into SiFP* & UFP*

Step 1

Step 2

Step 3

Step 4

Issue Status

Deferred

Done

In-Progress

Issue Type

Other

Task

Story

Bug

Category

Functional Story

Non-Functional Story

Calculation

SiFP

UFP

Notes:

*Performed by a Certified Function Point Specialist

**Functional Stories (from product backlog) = Functional Requirements (from RTM or FRD)

14 of 28

Dataset Demographics

14

Sample Size:18 Projects

Automated Information System

Majority (13) used cloud-hosted Amazon Web Services

Majority (15) used FFP or T&M Contracts

2 to 4-week Iterations

15 of 28

Dataset Demographics

15

16 of 28

Descriptive Statistics

16

Months

17 of 28

Descriptive Statistics

17

Relevant Range

    • 20-5,000 stories
    • 10-2,000 functional requirements
    • 80-11,000 function points
    • 9-200 Peak Staff FTEs

When selecting a regression model, consider the relevant range of each independent variable

18 of 28

Effort and Schedule Models

19 of 28

Effort Benchmarks

19

Category

Benchmark

25th Quartile

Median

75th Quartile

StdDev

CV

Effort

Hours/Functional Story

410

494

653

261

47%

Hours/UFP

61

81

107

40

46%

Hours/SiFP

57

71

100

39

47%

Practical Application:

  • For example, in practice, analysts can develop an effort estimate by taking the estimated size (e.g., SiFP = 200) multiplied by the appropriate effort benchmark (median value from lookup table above):

 

20 of 28

Schedule Benchmarks

20

Category

Benchmark

25th Quartile

Median

75th Quartile

StdDev

CV

Schedule

Functional Story/FTE/ Month

0.19

0.28

0.32

0.14

47%

UFP / FTE / Month

1.2

1.8

2.1

0.9

50%

SiFP / FTE / Month

1.3

2.1

2.3

1.1

52%

Practical Application:

  • For example, an analyst can develop a schedule estimate by taking the estimated size (e.g., SiFP = 200), dividing by the appropriate schedule benchmark (median value from lookup table above), and by the estimated peak staff (e.g., FTE = 10)

 

21 of 28

Effort Estimation Models

21

Model

CER

N

SE

R2

R2adj

R2pred

MAD

1

15

0.39

89.6%

88.8%

85.9%

31.3%

Effort (E) =

Total final development hours

REQ =

Functional stories from backlog, RTM, or FRD

Model

CER

N

SE

R2

R2adj

R2pred

MAD

2

15

0.66

70.2%

67.9%

59.0%

54.1%

Effort (E) =

Total final development hours

STORY =

Total stories obtained from JIRA backlog

22 of 28

Effort Estimation Models

22

Model

CER

N

SE

R2

R2adj

R2pred

MAD

4

14

0.39

84.4%

83.1%

78.2%

32.7%

Effort (E) =

Total final development hours

STY_PTS =

Story points obtained from JIRA backlog

Model

CER

N

SE

R2

R2adj

R2pred

MAD

3

15

0.64

71.5%

69.3%

59.4%

51.6%

Effort (E) =

Total final development hours

ISSUES =

Sum of stories, bugs, tasks, epics, or any other fixes

23 of 28

Effort Estimation Models

23

Model

CER

N

SE

R2

R2adj

R2pred

MAD

6

15

0.35

92.0%

90.7%

86.5%

25.9%

Effort (E) =

Total final development hours

SiFP=

Simple Function Point

D1=

Dummy variable (scope), where full development =1 and enhancement =0

Model

CER

N

SE

R2

R2adj

R2pred

MAD

5

15

0.38

90.0%

89.3%

85.6%

31.6%

Effort (E) =

Total final development hours

UFP =

Total unadjusted function points

24 of 28

Schedule Estimation Models

24

Model

SER

N

SE

R2

R2adj

R2pred

MAD

1

15

0.26

88.4%

86.5%

81.8%

18.8%

Schedule (S)=

Total final development months

REQ=

Functional stories from backlog, RTM, or FRD

D1=

Dummy variable (scope), where full development =1 and enhancement =0

Model

SER

N

SE

R2

R2adj

R2pred

MAD

2

15

0.27

88.1%

86.2%

80.8%

18.6%

Schedule (S)=

Total final development months

UFP=

Total unadjusted function points

D1=

Dummy variable (scope), where full development =1 and enhancement =0

Model

SER

N

SE

R2

R2adj

R2pred

MAD

3

15

0.27

88.0%

86.0%

80.4%

18.4%

Schedule (S)=

Total final development months

SiFP=

Simple Function Point

D1=

Dummy variable (scope), where full development =1 and enhancement =0

25 of 28

How do the Regression Models Rank?

25

Rank

Independent Variable

R2adj

R2pred

MMRE

Effort Models

1

90.7%

86.5%

25.9%

2

89.3%

85.6%

31.6%

3

88.8%

85.9%

31.3%

4

83.1%

78.1%

32.7%

5

69.3%

59.4%

51.6%

6

67.9%

59.0%

54.1%

Schedule Models

1

86.5%

81.8%

18.8%

2

86.2%

80.8%

18.6%

3

86.0%

80.4%

18.4%

Simple Function Point (SiFP), Unadjusted Function Points (UFP), and Functional Stories (REQ) are stronger predicters to both effort and schedule for agile projects

26 of 28

Results

27 of 28

Model Limitations

27

Internal Threats

    • Dataset timeframe (2014-2021) raises potential issues as the earlier projects (2014, 2016, 2018) may have used agile processes tailored to fit the agency’s need.

External Threats

    • Models proved to be effective for estimating agile in the DHS context. However, we cannot generalize beyond this group.
    • Agencies may not have access to Backlog, FRD or RTM for SiFP analysis

Constructive Threats

    • Small dataset does not allow for detecting effect with greater power (Overfitting)
    • Need a larger dataset to draw more confident statistical conclusions

28 of 28

Main Takeaways

28

Wider range of sizing measures (6) to estimate future agile software programs and evaluate contractor proposals

SiFP and UFP proved to be the most accurate predictors of agile software development effort and schedule

Analysis reveals that “Functional Story” is an effective predictor of effort and schedule, and easy to obtain

Popular agile measures such as story points, stories, and issues are not as effective predictors of effort and schedule

SiFP and UFP can be calculated early in the program allowing estimation from contract proposal through IOC (when popular agile measures are difficult to obtain)