2 of 31

Outline

THERE IS A GENERAL MOVEMENT IN FEDERAL PROGRAMS TOWARD AGILE SOFTWARE DEVELOPMENT.
HOWEVER:

THE PROGRAMS ARE OFTEN NOT FULLY AGILE.
THE PROGRAMS OFTEN INCLUDE VERY LARGE FUNCTIONAL SIZING (OVER 20,000 FP).
FUNCTIONAL SIZING CAPTURES THE FULL REQUIRMENT WHILE REQUIREMENTS, DESIGN, AND PREPERATION ARE OFTEN COMPLETED BY FEDERAL PROGRAMS BEFORE THE FUNCTIONAL SIZING IS COUNTED.
FEDERAL PROGRAMS NORMALLY HAVE A SIGNIFICANT “REUSE” COMPONENT.

MOST MODELS CONVERT FUNCTIONAL SIZE TO SLOC AND RUN THE MODELS IN THE SLOC MODE.

DEVELOPERS ARE LOOKING FOR A DIRECT CONVERSION FROM FUNCTIONAL SIZING TO HOURS.

ISSUES

3 of 31

What Is Agile

“Agile,” includes all forms of Agile and iterative development.
Stories, features, story points, and feature points to reflect the same concept, recognizing that a “feature” typically may be used in a different context than a “story.” Specifically, in large federal programs, “features” generally represent a larger concept than “stories.” We do believe that the application of estimating, management, and tracking practices can significantly and positively impact the success and cost of federal programs.
Two classes of federal agile software development programs.

Programs that are evolving on an incremental basis that generally follow the commercial Agile practice
Large “transformational” programs creating completely new capabilities. In these “transformational” programs a “Hybrid-Agile” approach is often applied with longer sprints and larger conceptual stories/features.

4 of 31

Agile is a Mindset*

Agile refers to the methods and best practices for organizing projects based on the values and principles documented in the Agile Manifesto.
No one way to implement Agile

Kanban
Scrum
Extreme Programming (XP)
Feature-driven development
Dynamic Systems Development Method
Crystal
Lean
Adaptive Project Framework

* From David DeWitt of Galorath

5 of 31

Practical Applications of Agile�Full or Hybrid Agile (Water-Scrum-Fall) Development

Agile

Testing and Sustainment (sometimes in the Sprint sometimes a separate activity)

Two classes of Federal programs

Incremental programs – Full Agile

Follow the commercial Agile practices

Small user stories
Single sprint, or even multiple user stories being completed in a single sprint

Generally, not applying a full EVM process

Transformational programs – Hybrid Agile

Creating completely new capabilities
“Hybrid-Agile” approach applied

Longer sprints
Larger conceptual stories/features
Full EVM process.

The Life Cycle Cost Estimate is generally higher for a full agile program because we assume that more uncertainty leads to cost growth.

6 of 31

Agile Software Development Metrics*

Attempts to quantify the cost of software failures don't agree on percentages, but they generally agree that the number is large.
The Standish Chaos Report is probably the most well-known of these studies. It defines success as projects delivered within budget, on schedule, and with expected functionality.

Agile is an increasingly popular software building methodology
At least 71% of U.S. companies are now using Agile.
Agile projects have a 64% success rate
Waterfall only has a 49% success rate.
Agile projects are nearly 1.5X more successful than waterfall.
Scrum is the most popular Agile framework, with 61% of respondents from 76 countries reporting that they use it

*Jack Flynn, 16 Amazing Agile Statistics (2023),

7 of 31

Issues with SLOC

While this presentation will show that SOLC can provide accurate estimates for Agile programs, SLOC based estimates are generally rejected.
The software developer’s perspective is;

Agile is a different/better development process than waterfall.
Current model are based on a SLOC data set and therefore not valid for agile estimation.

Functional sizing is being generally accepted by the agile developers

Good news – Concept well defined
Bad news – No good Federal database
Developers want a direct conversion from Functional size with out a SLOC conversion

8 of 31

Fundamentals of Software Estimation

In the late 1960’s and early 1970’s, analytic equations based on Lines of Code data were derived by Putnam, Jenson, Boehm, Galorath, and others.
There was general agreement that effort was a function of size;

Early COCOMO formula, E=3.2*(KSLOC)**1.05

(Today the exponent varies in commercial models from about 0.9 to 1.2).

Over time databases, software tools, productivity factors, and complexity factors have significantly effected the fundamental estimation equations have the models have become more complex.
Automated models have been adjusted to account for Agile practices.
SEER and TruePlanning examples are presented on the backup.

9 of 31

Model 2 could be used for estimation without COCOMO

Useful for a restricted set of projects (same programming language, people, project constraints, application domains, etc.)
Amount of PM is [1 FP + SP/6]/10

Model 4 uses an equivalent work volume consisting of FP and SP which is a suitable size input into COCOMO^®

Early Federal results show 90 hr/SFP vs about 15hr/FP from the above

Preliminary COCOMO III

Results Summary

Sep 14, 2023

	Models	Adj R²
1	PM’ = 0.185 + 0.09FP + 0.015SP	0.57
2	PM’ = 0 + 0.094FP + 0.015SP	0.68
3	PM’ = 0.056 * (FP + SP)^0.81	0.52
4	PM’ = 0.075 * (FP + 0.16*SP)^1.02	0.66

10 of 31

Model Estimation Summary

Size is a key input
Over time factors related to complexity and productivity have become more and more important

11 of 31

Size Continues to be a Driver�in Software effort estimation

SLOC

Functional Size

IFPUG
COSMIC
Nesma
Simple
…

Story Points*

Use Cases

User Stories

T-Shirt Sizes

BUT

Environmental, Productivity, Complexity, and other effort drivers are critical.

12 of 31

Sizing Is Still A Key Driver

Approaches to Software Sizing

Physical Size – Source Lines of Code (SLOC) – an objective measure – highly dependent on language and programmer skill – generally rejected by Agile developers since developed and designed for the Waterfall development method. SLOC counts can be automated reliably for historical data collection.

Relative Effort Size – Story Points, Tee Shirt Sizing, … - relative measure determined by Software Developers – these measures are generally familiar to Agile Teams

Functional Size – Objective Size measure, standardized, can be independently estimated

There are several Functional Sizing Metrics

13 of 31

Physical Sizing

Source Lines of Code (SLOC): the total number of lines of source code in a project – KSLOC, ESLOC, … (be sure your code measure is the same one used in the model)
Can use code counters like USC’s UCC or the Government (UCC-G).
SRDR is a good data source.
Advantages:

Accepted and is used in many automated models like COCOMO.
SLOC is easily quantified
SLOC is being used today to successfully estimate and manage agile programs

Disadvantages:

Different programming languages, programmer experience, and automated tools effect the code count.
When platforms and languages are different, LOC can be difficult to normalize.
For new programs, SLOC must be estimated; usually by analogy to similar programs.

Size is normally estimated as low, most likely, and high number.
Therefore, a distribution can be developed to estimate at the desired confidence level, e.g., the 70% level.

14 of 31

Relative Effort Size

Relative Effort Size is determined by the development team
Common relative measure are Story Points, Feature Points, Epics, Tee Shirt Size, …
The effort associated with each of these measure is based on expert opinion or analogy from previous work.
Advantage

This is a metric that most developers are comfortable with.

Disadvantage

There is no “formal”/consistent methodology
There are no standards so not good for cross project analysis or benchmarks

15 of 31

A Structured Approach to� Tee Shirt Sizing

Recent USAF SME estimates for Tee Shirt size are: Small (320 Hours); M = Medium (1600 Hours); L = Large (2880 Hours), and XL = Extra Large (6400 Hours).
In a recent large Federal agile development program, the Tee Shirt hour estimate was compared to a SEER SEM model run hours – there was a 20% difference in the total hours estimated.
When normalization was applied – an inclusion/exclusion analysis the difference was very close.

16 of 31

Functional Size

Functional Size Measurement (FSM) is a technique for measuring software in terms of the functionality it delivers.
Functional Size is primarily used at the planning stage for input into project resource estimation calculations for cost, effort and schedule.
There are multiple Functional Sizing Metrics - COSMIC − FiSMA −.
IFPUG/SFP/SiFP − Mark-II −NESMA

17 of 31

Military Requirements

Capers Jones, Estimating Software Costs, Second Edition, page 389

“Military software requirements are usually the most precise and exacting of any class of software. This is due to the long-standing requirement of traceability…. Although these military requirements documents are large and sometimes ambiguous, the specificity and completeness of the military software requirements makes it easier to derive function point totals than for any other kind of software application.”

Most models use some form of conversion to SLOC and use the Unadjusted (raw) function point count and allow the estimate to take care of the non-functional (SNAP) requirements.

18 of 31

Automated FP Counters

Automated FP counters like ScopeMaster and Candace use AI (Natural Language Processing (NLP) and a robust rules to inspect and estimate functional size of each requirement. They recognize specific verbs.
For large Federal programs there is normally a larger number of requirements found to be not functional (0 count) by the tools. This is due to poorly written requirements (per the tool) and specific language for Federal programs.
Comparison:

Manual count – 25,000 SFP
Initial model count – 13,000 to 18,000
Adjusted* Model Count - 25,000 to 30,000

*Adjustment made by applying the average FP count to each zero-count requirement.�

19 of 31

Cost to Calculate Function Points

A “Certified” function point counter (IFPUG, COSMIC, Nesma) is estimated at $100 to $200 an hour.
Assume 15 IFPUG FP/Hr
Then the cost per function point is between $7 per and $14 Per Fp
Therefore a 10,000 IFPUG count should cost between $70 K and $140 K
Experience shows that a SiFP count takes less than ½ the time of a full IFPUG count

20 of 31

Size Comparison

SIZING COMPARISON

Model	Estimated Total Hours	Delta from Tee Shirt Hours
Tee Shirt Hours (Developer's Estimate)	1,879,136	0%	from software development team
SEER total Hours (full LCCE)*	2,363,461	26%	full LCCCE
Seer Hours less Development**	1,844,010	-2%	comparison to developer’s logic
Tee Shirt Hours (Developer's Estimate)	1,879,136	0%
NEMO Hours	1,255,500	-33%	NEMO not full LCCE need to reconcile
COCOMO II SLOC Web Tool	1,410,758	-25%	using 152 hours per person month
COCOMO II FP Web Tool	1,959,158	4%	using 152 hours per person month
COCOMO III FP Model 4	358,211	-81%	PM=0.075((FP+0.16SP))^1.02
COCOMO III FP Model 2	366,099	-81%	PM=0+0.094*FP+0.015SP

21 of 31

Software Reuse

Pre-existing code can be recycled to perform the same function or repurposed to perform a similar but slightly different function. Code reusability increases productivity reduces costs and improves overall quality. Reusability in software development is a highly popular and productive practice.
Large Military program often have a significant (maybe 50%) reuse component.
When converting Functional Size to SLOC the sloc models account for reuse by reducing the effort associated with the reuse SLOC often using whir-box and black-box components.

22 of 31

Proposed Reuse Model for Functional Size

Functional reuse could follow the COCOMO II Reuse model
Black-box is functional reuse where the functionality is not modified
White-box is functional reuse where the functionality is modified.�

COCOMO II Reuse Model:�For generated code:�PM = (ASLOC * AT/100)/ATPROD�ASLOC is the number of lines of generated code�AT is the percentage of code automatically generated.�ATPROD is the productivity of engineers in integrating this code.

The Functional Reuse model would replace APROD is replaced with ATFUN – the functionality of engineers in integrating the function

23 of 31

Summary

All size metric can provide meaningful estimates when used appropriately.
The analysis documented the successful utilization of SLOC, SFP, and Tee Shirt sizing in recent large Federal Agile Development Programs.

In this case the Tee Shirt sizing was consistently applied by an experienced team.

We need to be careful to make an Apples-to-Apples Comparison when comparing model results.
Automated FP counters can be successfully applied.
No direct FP to hours model exists, although Boehm Center for Systems and Software Engineering (BCSSE) at USC is working on A COCOMO III release that model addresses the conversion from Function Points to hours.

24 of 31

Recommendations

Continue development of a model something like; Effort = A(SiFP)^b)C

Where: A = a “complexity” modifier, FP = number of function points, b = a derived exponent, C = a “productivity” modifier

Address the issue of reuse in the final formulation

Determine the effect of large Functional Sizing

26 of 31

SEER by Galorath

Model is based on the Jensen SEER/Sage equation

S_E = C_TE * (K)^1/2 * t_d

Where: S_E = Product Size in ESLOC

C_TE = Effective Technology Constant (based on cost driver inputs)

K = total software life-cycle effort in person-years

t_d= development time in years

Software development effort = 0.3945 * K

SEER Effort Formula: K = (S_E / (C_TE)(t_d))²

SEER Schedule Formula: t_d = D^-0.2 (S_E / C_TE)^0.4

Where:

D = Staffing Complexity Constant (i.e., How hard is it to get the staff required to code this type of software?). Note that the default value for D = 15

27 of 31

Unison Cost Engineering�TruePlanning for Software

TruePlanning® CER:

𝐸𝑓𝑓𝑜𝑟𝑡 (𝐻𝑜𝑢𝑟𝑠) = 𝐴 ∗ 𝑆𝑖𝑧𝑒 ^𝐵

Where:

A = influence of the drivers in the model (Functional Complexity, Technology, People, Reuse, Organizational Productivity, etc. - 33 numerical cost drivers + 10 nominal cost drivers)

Size = software size in specified Size Units (SLOC, IFPUG FP, COSMIC FP, etc.)

B = economy or diseconomy of scale as a function of Organizational Productivity (B ranges between 1.077 and 1.117)

Example: Military Avionics in SLOC with default values: 𝐸𝑓𝑓𝑜𝑟𝑡 = 0.252 ∗ 𝑆𝑖𝑧𝑒 1.107

𝑆𝑐ℎ𝑒𝑑𝑢𝑙𝑒 (𝑀𝑜𝑛𝑡ℎ𝑠) = 𝐶 ∗ 𝐸𝑓𝑓𝑜𝑟𝑡 ^0.33

Where:

C = efficiency/inefficiencies in developing the software based on model inputs (Functional Complexity, Technology, People, Organizational Productivity, etc., 33 numerical drivers and 19 nominal drivers – C ranges from 0.41 to 0.9)

28 of 31

Simple Function Points�SiFP

The Simple Function Point (SiFP) method estimates a software’s functional size based on quantifying its business functions / transaction types, system interfaces, and other functional requirements from high-level acquisition documentation.
Method developed by Italian researchers, acquired by IFPUG in 2019 (https://www.ifpug.org/ifpug-acquires-the-simple-function-points-method)
Can be performed quickly and early in a program’s lifecycle using existing documents.
Focuses on two elementary processes:

29 of 31

Simple Function Point Analysis�(Validated by a DHS Study referenced below)

2022 study of 15 DHS IT systems and 3 DoD IT systems
“Based on the comparison of effort models, although all models passed the criteria for statistical significance, simple function points, unadjusted function points, and functional requirements are stronger predicters to development effort than stories, story points, or issues“
Simple Function Points produced the highest adjusted R-squared value indicating a very strong predictive capability

“Lets Go Agile: Data-Driven Agile Software Costs and Schedule Models for DHS Projects”, ICEAA 2022, Wilson Rosa, Sara Jardine, Kimberly Roye, Kyle Eaton, and Chad Lucas

30 of 31

Functional Size Issues

Once the Function Point Count (FPC) is established, the count must be converted in hours

International Software Benchmarking Standards Group (ISBSG) offers a good commercial database
Currently no good Federal/National Security database (SRDR is beginning to collect this data)
Analogy to similar programs
Most Commercial models “backfire” the FPC into SLOC.

When the FPC is completed, some requirements will have a “0” FPC. Items such as documentation or meeting a certain developmental standard do not require end user interaction and, as such, are not functional. There is, however, effort associated with these requirements as they add complexity to the overall work effort.

Most software cost estimating models, account for these hours from the parametric estimating equations derived for their historical data base.
Non-model users might utilize tools like SNAP to account for these non-functional hours.

31 of 31

Backfiring FP to SLOC

There are multiple data sources (Capers Jones, Galorath QSM, Unison) that offer backfiring tables.
Correct backfiring requires different metrics for each type of FP *
Below is a partial table from QSM

* Applied Software Measurement 3^rd Edition,, Capers Jones, page 80

	QSM SLOC/FP Data
Language	Avg	Median	Low	High
ABAP (SAP) *	28	18	16	60
ASP*	51	54	15	69
Assembler *	119	98	25	320
Brio +	14	14	13	16
C *	97	99	39	333
C++ *	50	53	25	80
C# *	54	59	29	70
COBOL *	61	55	23	297

1 of 31

2 of 31

3 of 31

4 of 31

5 of 31

6 of 31

7 of 31

8 of 31

9 of 31

10 of 31

11 of 31

12 of 31

13 of 31

14 of 31

15 of 31

16 of 31

17 of 31

18 of 31

19 of 31

20 of 31

21 of 31

22 of 31

23 of 31

24 of 31

25 of 31

26 of 31

27 of 31

28 of 31

29 of 31

30 of 31

31 of 31