1 of 28

Let’s Go Agile!�Data-Driven Agile Software Cost and Schedule Models ��

First International Boehm Forum on COCOMO®

and Systems and Software Cost Modeling

November 9-10, 2022

Disclaimer: The contents of this paper reflect the views of the authors and are not necessarily endorsed by the Department of Homeland Security

Wilson Rosa

Sara Jardine

2 of 28

Agenda

2

Introduction

Problem Statement
Proposed Solution
Breakthroughs
Benefits to Program Management
DHS Agile History

Agile Project Dataset

Data Collection
Data Sources
Variables
Data Normalization
Demographics
Descriptive Statistics

Effort and Schedule Models

Benchmarks
Effort Estimation Models
Schedule Estimation Models
Model Rankings

Results

Model Usefulness
Model Limitations
Main Takeaways
Work in Progress

3 of 28

What is the Problem?

3

DoD

…pilot program to use agile or iterative development methods to tailor major software-intensive warfighting systems and defense business systems. …software development pilot program using agile best practices. – 2018 NDAA Sec. 873/874
…description of how the Department will increasingly automate accreditation processes, pursue agile development, incorporate machine learning, and foster reciprocity across authorizing officials. – 2020 NDAA Sec. 1654.b.2.c

DHS

“The Department [DHS] needs a credible and accurate method for estimating the cost of software development programs that can be tracked over time and provide insight into whether a program is behind schedule or is forecasted to exceed initial cost projections.”

- Stacy Marcott, Acting Chief Financial Officer, May 30, 2019

Policy mandates the application of agile software development best practices

4 of 28

What is the Solution?

Offer a set of data-driven software development effort and schedule estimating models for DHS agile projects

Acquisition community can use these models to:

More accurately estimate effort and schedule to support DHS and DoD decision reviews of agile programs
Crosscheck vendor proposals and evaluate contractor performance

4

5 of 28

Benefits to Program Management

5

FOC = Full Operational Capability

At any stage of the agile program acquisition lifecycle, the PM can choose a sizing measure to estimate the software development cost and schedule

Notes: MNS = Mission Needs Statement

CONOPS = Concept of Operations

RTM = Requirements Traceability Matrix

IOC = Initial Operational Capability

|

6 of 28

Evolution of Agile at DHS

6

2010

OMB issued a 25-point plan to reform IT projects and called on federal agencies to implement shorter delivery timeframe

2016

DHS Agile Development and Delivery for IT Instruction Manual 102-01-004-01

2017

DHS USM directed DHS CAD to find ways to improve agile software development programs [1]

2018

DHS Agile Methodology for Software Development and Delivery for IT, Policy Instruction 102-01-004

2019

DHS CAD launched the cross-agency Joint Agile Software Innovation (JASI) Cost IPT

2021

DHS Agile Guidebook

Collecting Agile data requires an agency data collection policy, close collaboration with PMs, and frequent iterative reassessment

Agile is an iterative approach to deliver solutions incrementally

7 of 28

Study Breakthroughs

Delivers first-ever agile software cost dataset (n=18) for DHS cost community
Presents a new process for collecting, normalizing, and analyzing agile project cost and schedule data for Firm Fixed Price and Time & Materials contracts
Introduces Functional Story as a new sizing measure for agile cost estimation
Offers data-driven agile software project effort and schedule benchmarks and regression models for six different sizing measures:

7

Functional Story

Unadjusted Function Point

Simple Function Point

Story

Story Point

Issues

1

2

3

4

5

6

8 of 28

Agile Project Dataset

9 of 28

Data Collection

Dataset included 18 agile projects

DHS (15) and DoD (3)
Across 11 different companies
12 completed last two years

100% data collection efforts occurred between March 2020 to January 2022
Data provided by the Program Managers

9

10 of 28

Data Sources

All data in this study were provided by the Agile Program Management Offices
100% obtained from Official/Authoritative Documents:

10

Effort

Monthly Contractor Invoices
Product Backlog

Schedule

Monthly Contractor Invoices
Product Backlog (in JIRA)

Size

Requirements Traceability Matrix
Functional Requirements Document
Product Backlog (in JIRA)

Context

Acquisition Documents
Agile Core Metrics

11 of 28

Variable Selection�(Common Sense)

11

Effort

Actual labor hours to complete all contractor development activities
Reported at the release level

Schedule

Actual development time (months) to complete all software development activities
Reported at the release level

Functional Story

Subset of functional requirements describing what the software does in terms of tasks and services

Issue

Unit of work traced through a workflow, from creation to completion
Total issues are the sum of stories, bugs, tasks, epics, and others

Story

Feature or unit of business value that can be estimated and tested
Describes work that must be done to deliver a feature for a product

Story Point

Unit of measure to express the overall size of a story, task, or other piece of work in the backlog

Unadj. Function Point

Function point count without the assignment of complexity to any of the objects counted

Simple Function Point

Method for sizing software requiring the identification of elementary processes and logic files to approximate a function point count

Scope

Categorical variable indicating whether the scope of project is an enhancement or full development

Dependent

Independent

Categorical

12 of 28

Data Normalization:�How did we measure effort?

Effort hours in this study captures total labor incurred by the contractor’s agile development teams
Total labor includes 11 cost elements aligned to the DHS IT Work Breakdown Structure (WBS)

12

ID	DHS IT WBS Element
1.i.1	Program Management
1.i.2	Systems Engineering
1.i.4.2	Software Development
1.i.4.3	Data Development & Transition
1.i.4.5	Training Development
1.i.4.6.1	Development Test & Evaluation
1.i.4.6.1	Cybersecurity Test & Evaluation
1.i.4.7	Logistics Support Development
1.i.7	System Level Integration & Test
1.i.8.6.1	Help Desk/Service Desk (Tier 3)
1.i.8.6.4	Software Maintenance

Reporting labor at the total level (as opposed to software development alone), is recommended since most DHS agile development contracts are FFP or T&M, and generally do not breakout effort by major cost elements as seen in traditional cost-plus contracts

Why use total labor?

13 of 28

Data Normalization:�Counting Functional Story, SiFP, UFP

13

Extract Product Backlog (JIRA)

Go to column titled, Issue Status and filter by rows marked as Done

Find Story

Go to column titled, Issue Type and the rows marked as story in this column

Find and Count Functional Story

Categorize each story as functional or non-functional*
Count rows marked as functional **

Calculate Function Point

Convert each functional story into SiFP* & UFP*

Step 1

Step 2

Step 3

Step 4

Issue Status
Deferred
Done
In-Progress

Issue Type
Other
Task
Story
Bug

Category
Functional Story
Non-Functional Story

Calculation
SiFP
UFP

Notes:

*Performed by a Certified Function Point Specialist

**Functional Stories (from product backlog) = Functional Requirements (from RTM or FRD)

Functional Stories and Function Points were not given …

We had to hire CFPS to derive the counts for each project using 1 of 2 sources: REQ DOCS or Product Backlog

Here are the 4 steps we used when counting FS and FP from Product Backlog (this should apply to gov or private sector):

Step 1, Get the latest product backlog from JIRA, go to column “ISSUE STATUS” and filter by rows marked as “DONE”

Step 2, Go to column “Issue Type” and filter by rows marked as “Story”

STEP 3, Categorize each story as functional or non-functional. This is the most critical step and if you don’t get it right, you counts are useless…This can be done by the development team or CFPS (like we did). No way this can be done by automated tools

STEP 4, Is easy….You convert each functional story into FPs (Manually or via automated commercial tools)

This concludes the experimental design overview, now we are ready to talk about the DATA (NEXT SLIDE)

14 of 28

Dataset Demographics

14

Sample Size:18 Projects

Automated Information System

Majority (13) used cloud-hosted Amazon Web Services

Majority (15) used FFP or T&M Contracts

2 to 4-week Iterations

15 of 28

Dataset Demographics

15

16 of 28

Descriptive Statistics

16

Months

17 of 28

Descriptive Statistics

17

Relevant Range

20-5,000 stories
10-2,000 functional requirements
80-11,000 function points
9-200 Peak Staff FTEs

When selecting a regression model, consider the relevant range of each independent variable

18 of 28

Effort and Schedule Models

19 of 28

Effort Benchmarks

19

Category	Benchmark	25^th Quartile	Median	75^th Quartile	StdDev	CV
Effort	Hours/Functional Story	410	494	653	261	47%
	Hours/UFP	61	81	107	40	46%
	Hours/SiFP	57	71	100	39	47%

Practical Application:

For example, in practice, analysts can develop an effort estimate by taking the estimated size (e.g., SiFP = 200) multiplied by the appropriate effort benchmark (median value from lookup table above):

20 of 28

Schedule Benchmarks

20

Category	Benchmark	25^th Quartile	Median	75^th Quartile	StdDev	CV
Schedule	Functional Story/FTE/ Month	0.19	0.28	0.32	0.14	47%
	UFP / FTE / Month	1.2	1.8	2.1	0.9	50%
	SiFP / FTE / Month	1.3	2.1	2.3	1.1	52%

Practical Application:

For example, an analyst can develop a schedule estimate by taking the estimated size (e.g., SiFP = 200), dividing by the appropriate schedule benchmark (median value from lookup table above), and by the estimated peak staff (e.g., FTE = 10)

21 of 28

Effort Estimation Models

21

Model	CER	N	SE	R²	R²_adj	R²_pred	MAD
1		15	0.39	89.6%	88.8%	85.9%	31.3%
	Effort (E) =	Total final development hours
	REQ =	Functional stories from backlog, RTM, or FRD

Model	CER	N	SE	R²	R²_adj	R²_pred	MAD
2		15	0.66	70.2%	67.9%	59.0%	54.1%
	Effort (E) =	Total final development hours
	STORY =	Total stories obtained from JIRA backlog

22 of 28

Effort Estimation Models

22

Model	CER	N	SE	R²	R²_adj	R²_pred	MAD
4		14	0.39	84.4%	83.1%	78.2%	32.7%
	Effort (E) =	Total final development hours
	STY_PTS =	Story points obtained from JIRA backlog

Model	CER	N	SE	R²	R²_adj	R²_pred	MAD
3		15	0.64	71.5%	69.3%	59.4%	51.6%
	Effort (E) =	Total final development hours
	ISSUES =	Sum of stories, bugs, tasks, epics, or any other fixes

23 of 28

Effort Estimation Models

23

Model	CER	N	SE	R²	R²_adj	R²_pred	MAD
6		15	0.35	92.0%	90.7%	86.5%	25.9%
	Effort (E) =	Total final development hours
	SiFP=	Simple Function Point
	D1=	Dummy variable (scope), where full development =1 and enhancement =0

Model	CER	N	SE	R²	R²_adj	R²_pred	MAD
5		15	0.38	90.0%	89.3%	85.6%	31.6%
	Effort (E) =	Total final development hours
	UFP =	Total unadjusted function points

24 of 28

Schedule Estimation Models

24

Model	SER	N	SE	R²	R²_adj	R²_pred	MAD
1		15	0.26	88.4%	86.5%	81.8%	18.8%
	Schedule (S)=	Total final development months
	REQ=	Functional stories from backlog, RTM, or FRD
	D1=	Dummy variable (scope), where full development =1 and enhancement =0

Model	SER	N	SE	R²	R²_adj	R²_pred	MAD
2		15	0.27	88.1%	86.2%	80.8%	18.6%
	Schedule (S)=	Total final development months
	UFP=	Total unadjusted function points
	D1=	Dummy variable (scope), where full development =1 and enhancement =0

Model	SER	N	SE	R²	R²_adj	R²_pred	MAD
3		15	0.27	88.0%	86.0%	80.4%	18.4%
	Schedule (S)=	Total final development months
	SiFP=	Simple Function Point
	D1=	Dummy variable (scope), where full development =1 and enhancement =0

25 of 28

How do the Regression Models Rank?

25

Rank	Independent Variable	R²_adj	R²_pred	MMRE
Effort Models
1		90.7%	86.5%	25.9%
2		89.3%	85.6%	31.6%
3		88.8%	85.9%	31.3%
4		83.1%	78.1%	32.7%
5		69.3%	59.4%	51.6%
6		67.9%	59.0%	54.1%
Schedule Models
1		86.5%	81.8%	18.8%
2		86.2%	80.8%	18.6%
3		86.0%	80.4%	18.4%

Simple Function Point (SiFP), Unadjusted Function Points (UFP), and Functional Stories (REQ) are stronger predicters to both effort and schedule for agile projects

26 of 28

Results

27 of 28

Model Limitations

27

Internal Threats

Dataset timeframe (2014-2021) raises potential issues as the earlier projects (2014, 2016, 2018) may have used agile processes tailored to fit the agency’s need.

External Threats

Models proved to be effective for estimating agile in the DHS context. However, we cannot generalize beyond this group.
Agencies may not have access to Backlog, FRD or RTM for SiFP analysis

Constructive Threats

Small dataset does not allow for detecting effect with greater power (Overfitting)
Need a larger dataset to draw more confident statistical conclusions

28 of 28

Main Takeaways

28

Wider range of sizing measures (6) to estimate future agile software programs and evaluate contractor proposals

SiFP and UFP proved to be the most accurate predictors of agile software development effort and schedule

Analysis reveals that “Functional Story” is an effective predictor of effort and schedule, and easy to obtain

Popular agile measures such as story points, stories, and issues are not as effective predictors of effort and schedule

SiFP and UFP can be calculated early in the program allowing estimation from contract proposal through IOC (when popular agile measures are difficult to obtain)

Main takeaways of our agile IT project dataset and resulting CER and SERs include:

We now have a wider range of 6 sizing measures to estimate future agile software programs and evaluate contractor proposals
Simple function points and unadjusted function points have proven to be the most accurate predictors of agile software development effort and schedule
Our analysis shows that the “functional story” is an effective predictor of effort and schedule, which is also easy to obtain
Popular agile measure such as story points, stories and issues are not as effective predictors to software development effort and schedule
Finally, simple function point and unadjusted function points can be calculated early in the program allowing the ability to estimate as early as contract proposal through IOC when popular agile measure are difficult to obtain