1 of 31

A Comparison of Tee Shirt, Functional, and SLOC Sizing on a Current Federal Agile Software Development Programs

\Presentation For

Boehm CSSE and Practical Software and systems Measurement Group

by

 

Bob Hunt, 703-201-0651, bhunt@n-s-i.us

1

2 of 31

Outline

  • THERE IS A GENERAL MOVEMENT IN FEDERAL PROGRAMS TOWARD AGILE SOFTWARE DEVELOPMENT.
  • HOWEVER:
    • THE PROGRAMS ARE OFTEN NOT FULLY AGILE.
    • THE PROGRAMS OFTEN INCLUDE VERY LARGE FUNCTIONAL SIZING (OVER 20,000 FP).
    • FUNCTIONAL SIZING CAPTURES THE FULL REQUIRMENT WHILE REQUIREMENTS, DESIGN, AND PREPERATION ARE OFTEN COMPLETED BY FEDERAL PROGRAMS BEFORE THE FUNCTIONAL SIZING IS COUNTED.
    • FEDERAL PROGRAMS NORMALLY HAVE A SIGNIFICANT “REUSE” COMPONENT.
  • MOST MODELS CONVERT FUNCTIONAL SIZE TO SLOC AND RUN THE MODELS IN THE SLOC MODE.

  • DEVELOPERS ARE LOOKING FOR A DIRECT CONVERSION FROM FUNCTIONAL SIZING TO HOURS.

ISSUES

2

3 of 31

What Is Agile

  • “Agile,” includes all forms of Agile and iterative development.
  • Stories, features, story points, and feature points to reflect the same concept, recognizing that a “feature” typically may be used in a different context than a “story.” Specifically, in large federal programs, “features” generally represent a larger concept than “stories.” We do believe that the application of estimating, management, and tracking practices can significantly and positively impact the success and cost of federal programs.
  • Two classes of federal agile software development programs.
    • Programs that are evolving on an incremental basis that generally follow the commercial Agile practice
    • Large “transformational” programs creating completely new capabilities. In these “transformational” programs a “Hybrid-Agile” approach is often applied with longer sprints and larger conceptual stories/features.

3

4 of 31

Agile is a Mindset*

  • Agile refers to the methods and best practices for organizing projects based on the values and principles documented in the Agile Manifesto.
  • No one way to implement Agile
    • Kanban
    • Scrum
    • Extreme Programming (XP)
    • Feature-driven development
    • Dynamic Systems Development Method
    • Crystal
    • Lean
    • Adaptive Project Framework

* From David DeWitt of Galorath

4

5 of 31

Practical Applications of Agile�Full or Hybrid Agile (Water-Scrum-Fall) Development

Agile

Testing and Sustainment (sometimes in the Sprint sometimes a separate activity)

Two classes of Federal programs

  • Incremental programs – Full Agile
    • Follow the commercial Agile practices
      • Small user stories
      • Single sprint, or even multiple user stories being completed in a single sprint
    • Generally, not applying a full EVM process
  • Transformational programs – Hybrid Agile
    • Creating completely new capabilities
    • “Hybrid-Agile” approach applied
      • Longer sprints
      • Larger conceptual stories/features
      • Full EVM process.

  • The Life Cycle Cost Estimate is generally higher for a full agile program because we assume that more uncertainty leads to cost growth.

5

6 of 31

Agile Software Development Metrics*

  • Attempts to quantify the cost of software failures don't agree on percentages, but they generally agree that the number is large.
  • The Standish Chaos Report is probably the most well-known of these studies. It defines success as projects delivered within budget, on schedule, and with expected functionality.

  • Agile is an increasingly popular software building methodology
  • At least 71% of U.S. companies are now using Agile.
  • Agile projects have a 64% success rate
  • Waterfall only has a 49% success rate.
  • Agile projects are nearly 1.5X more successful than waterfall.
  • Scrum is the most popular Agile framework, with 61% of respondents from 76 countries reporting that they use it

*Jack Flynn, 16 Amazing Agile Statistics (2023),

6

7 of 31

Issues with SLOC

  • While this presentation will show that SOLC can provide accurate estimates for Agile programs, SLOC based estimates are generally rejected.
  • The software developer’s perspective is;
    • Agile is a different/better development process than waterfall.
    • Current model are based on a SLOC data set and therefore not valid for agile estimation.
  • Functional sizing is being generally accepted by the agile developers
    • Good news – Concept well defined
    • Bad news – No good Federal database
    • Developers want a direct conversion from Functional size with out a SLOC conversion

7

8 of 31

Fundamentals of Software Estimation

  • In the late 1960’s and early 1970’s, analytic equations based on Lines of Code data were derived by Putnam, Jenson, Boehm, Galorath, and others.
  • There was general agreement that effort was a function of size;
    • Early COCOMO formula, E=3.2*(KSLOC)**1.05

(Today the exponent varies in commercial models from about 0.9 to 1.2).

  • Over time databases, software tools, productivity factors, and complexity factors have significantly effected the fundamental estimation equations have the models have become more complex.
  • Automated models have been adjusted to account for Agile practices.
  • SEER and TruePlanning examples are presented on the backup.

8

9 of 31

  • Model 2 could be used for estimation without COCOMO
    • Useful for a restricted set of projects (same programming language, people, project constraints, application domains, etc.)
    • Amount of PM is [1 FP + SP/6]/10
  • Model 4 uses an equivalent work volume consisting of FP and SP which is a suitable size input into COCOMO®

  • Early Federal results show 90 hr/SFP vs about 15hr/FP from the above

Preliminary COCOMO III

Results Summary

Sep 14, 2023

Copyright 2023 Software Metrics Inc.

Models

Adj R2

1

PM’ = 0.185 + 0.09*FP + 0.015*SP

0.57

2

PM’ = 0 + 0.094*FP + 0.015*SP

0.68

3

PM’ = 0.056 * (FP + SP)0.81

0.52

4

PM’ = 0.075 * (FP + 0.16*SP)1.02

0.66

9

10 of 31

Model Estimation Summary

  • Size is a key input
  • Over time factors related to complexity and productivity have become more and more important

10

11 of 31

Size Continues to be a Driver�in Software effort estimation

SLOC

Functional Size

  • IFPUG
  • COSMIC
  • Nesma
  • Simple

Story Points*

Use Cases

User Stories

T-Shirt Sizes

BUT

Environmental, Productivity, Complexity, and other effort drivers are critical.

11

12 of 31

Sizing Is Still A Key Driver

Approaches to Software Sizing

  • Physical Size – Source Lines of Code (SLOC) – an objective measure – highly dependent on language and programmer skill – generally rejected by Agile developers since developed and designed for the Waterfall development method. SLOC counts can be automated reliably for historical data collection.

  • Relative Effort Size – Story Points, Tee Shirt Sizing, … - relative measure determined by Software Developers – these measures are generally familiar to Agile Teams

  • Functional Size – Objective Size measure, standardized, can be independently estimated
    • There are several Functional Sizing Metrics

.

12

13 of 31

Physical Sizing

  • Source Lines of Code (SLOC): the total number of lines of source code in a project – KSLOC, ESLOC, … (be sure your code measure is the same one used in the model)
  • Can use code counters like USC’s UCC or the Government (UCC-G).
  • SRDR is a good data source.
  • Advantages: 
    • Accepted and is used in many automated models like COCOMO.
    • SLOC is easily quantified
    • SLOC is being used today to successfully estimate and manage agile programs
  • Disadvantages:  
    • Different programming languages, programmer experience, and automated tools effect the code count.
    • When platforms and languages are different, LOC can be difficult to normalize.
    • For new programs, SLOC must be estimated; usually by analogy to similar programs.
  • Size is normally estimated as low, most likely, and high number.
  • Therefore, a distribution can be developed to estimate at the desired confidence level, e.g., the 70% level.

13

14 of 31

Relative Effort Size

  • Relative Effort Size is determined by the development team
  • Common relative measure are Story Points, Feature Points, Epics, Tee Shirt Size, …
  • The effort associated with each of these measure is based on expert opinion or analogy from previous work.
  • Advantage
    • This is a metric that most developers are comfortable with.
  • Disadvantage
    • There is no “formal”/consistent methodology
    • There are no standards so not good for cross project analysis or benchmarks

14

15 of 31

A Structured Approach to� Tee Shirt Sizing

  • Recent USAF SME estimates for Tee Shirt size are: Small (320 Hours); M = Medium (1600 Hours); L = Large (2880 Hours), and XL = Extra Large (6400 Hours).
  • In a recent large Federal agile development program, the Tee Shirt hour estimate was compared to a SEER SEM model run hours – there was a 20% difference in the total hours estimated.
  • When normalization was applied – an inclusion/exclusion analysis the difference was very close.

15

16 of 31

Functional Size

  • Functional Size Measurement (FSM) is a technique for measuring software in terms of the functionality it delivers.
  • Functional Size is primarily used at the planning stage for input into project resource estimation calculations for cost, effort and schedule.
  • There are multiple Functional Sizing Metrics - COSMIC − FiSMA −.
  • IFPUG/SFP/SiFP − Mark-II −NESMA 

16

17 of 31

Military Requirements

  • Capers Jones, Estimating Software Costs, Second Edition, page 389

“Military software requirements are usually the most precise and exacting of any class of software. This is due to the long-standing requirement of traceability…. Although these military requirements documents are large and sometimes ambiguous, the specificity and completeness of the military software requirements makes it easier to derive function point totals than for any other kind of software application.”

  • Most models use some form of conversion to SLOC and use the Unadjusted (raw) function point count and allow the estimate to take care of the non-functional (SNAP) requirements.

17

18 of 31

Automated FP Counters

  • Automated FP counters like ScopeMaster and Candace use AI (Natural Language Processing (NLP) and a robust rules to inspect and estimate functional size of each requirement. They recognize specific verbs.
  • For large Federal programs there is normally a larger number of requirements found to be not functional (0 count) by the tools. This is due to poorly written requirements (per the tool) and specific language for Federal programs.
  • Comparison:
    • Manual count – 25,000 SFP
    • Initial model count – 13,000 to 18,000
    • Adjusted* Model Count - 25,000 to 30,000

*Adjustment made by applying the average FP count to each zero-count requirement.

18

19 of 31

Cost to Calculate Function Points

  • A “Certified” function point counter (IFPUG, COSMIC, Nesma) is estimated at $100 to $200 an hour.
  • Assume 15 IFPUG FP/Hr
  • Then the cost per function point is between $7 per and $14 Per Fp
  • Therefore a 10,000 IFPUG count should cost between $70 K and $140 K
  • Experience shows that a SiFP count takes less than ½ the time of a full IFPUG count

19

20 of 31

Size Comparison

SIZING COMPARISON

Model

Estimated Total Hours

Delta from Tee Shirt Hours

Tee Shirt Hours (Developer's Estimate)

1,879,136

0%

from software development team

SEER total Hours (full LCCE)*

2,363,461

26%

full LCCCE

Seer Hours less Development**

1,844,010

-2%

comparison to developer’s logic

Tee Shirt Hours (Developer's Estimate)

1,879,136

0%

NEMO Hours

1,255,500

-33%

NEMO not full LCCE need to reconcile

COCOMO II SLOC Web Tool

1,410,758

-25%

using 152 hours per person month

COCOMO II FP Web Tool

1,959,158

4%

using 152 hours per person month

COCOMO III FP Model 4

358,211

-81%

PM=0.075*((FP+0.16*SP))^1.02

COCOMO III FP Model 2

366,099

-81%

PM=0+0.094*FP+0.015SP

20

21 of 31

Software Reuse

  • Pre-existing code can be recycled to perform the same function or repurposed to perform a similar but slightly different function. Code reusability increases productivity reduces costs and improves overall quality. Reusability in software development is a highly popular and productive practice.
  • Large Military program often have a significant (maybe 50%) reuse component.
  • When converting Functional Size to SLOC the sloc models account for reuse by reducing the effort associated with the reuse SLOC often using whir-box and black-box components.

21

22 of 31

Proposed Reuse Model for Functional Size

  • Functional reuse could follow the COCOMO II Reuse model
  • Black-box is functional reuse where the functionality is not modified
  • White-box is functional reuse where the functionality is modified.�

COCOMO II Reuse Model:�For generated code:�PM = (ASLOC * AT/100)/ATPROD�ASLOC is the number of lines of generated code�AT is the percentage of code automatically generated.�ATPROD is the productivity of engineers in integrating this code.

The Functional Reuse model would replace APROD is replaced with ATFUN – the functionality of engineers in integrating the function

22

23 of 31

Summary

Summary

  • All size metric can provide meaningful estimates when used appropriately.
  • The analysis documented the successful utilization of SLOC, SFP, and Tee Shirt sizing in recent large Federal Agile Development Programs.
    • In this case the Tee Shirt sizing was consistently applied by an experienced team.
  • We need to be careful to make an Apples-to-Apples Comparison when comparing model results.
  • Automated FP counters can be successfully applied.
  • No direct FP to hours model exists, although Boehm Center for Systems and Software Engineering (BCSSE) at USC is working on A COCOMO III release that model addresses the conversion from Function Points to hours.

23

24 of 31

Recommendations

Recommendations

  • Continue development of a model something like; Effort = A(SiFP)^b)C

Where: A = a “complexity” modifier, FP = number of function points, b = a derived exponent, C = a “productivity” modifier

  • Address the issue of reuse in the final formulation

  • Determine the effect of large Functional Sizing

24

25 of 31

BACKUP

25

26 of 31

SEER by Galorath

Model is based on the Jensen SEER/Sage equation

SE = CTE * (K)1/2 * td

Where: SE = Product Size in ESLOC

CTE = Effective Technology Constant (based on cost driver inputs)

K = total software life-cycle effort in person-years

td= development time in years

Software development effort = 0.3945 * K

SEER Effort Formula: K = (SE / (CTE)(td))2

SEER Schedule Formula: td = D-0.2 (SE / CTE)0.4

Where:

D = Staffing Complexity Constant (i.e., How hard is it to get the staff required to code this type of software?). Note that the default value for D = 15

26

27 of 31

Unison Cost Engineering�TruePlanning for Software

TruePlanning® CER:

𝐸𝑓𝑓𝑜𝑟𝑡 (𝐻𝑜𝑢𝑟𝑠) = 𝐴 ∗ 𝑆𝑖𝑧𝑒 ^𝐵

Where:

A = influence of the drivers in the model (Functional Complexity, Technology, People, Reuse, Organizational Productivity, etc. - 33 numerical cost drivers + 10 nominal cost drivers)

Size = software size in specified Size Units (SLOC, IFPUG FP, COSMIC FP, etc.)

B = economy or diseconomy of scale as a function of Organizational Productivity (B ranges between 1.077 and 1.117)

Example: Military Avionics in SLOC with default values: 𝐸𝑓𝑓𝑜𝑟𝑡 = 0.252 ∗ 𝑆𝑖𝑧𝑒 1.107

𝑆𝑐ℎ𝑒𝑑𝑢𝑙𝑒 (𝑀𝑜𝑛𝑡ℎ𝑠) = 𝐶 ∗ 𝐸𝑓𝑓𝑜𝑟𝑡 ^0.33

Where:

C = efficiency/inefficiencies in developing the software based on model inputs (Functional Complexity, Technology, People, Organizational Productivity, etc., 33 numerical drivers and 19 nominal drivers – C ranges from 0.41 to 0.9)

27

28 of 31

Simple Function Points�SiFP

  • The Simple Function Point (SiFP) method estimates a software’s functional size based on quantifying its business functions / transaction types, system interfaces, and other functional requirements from high-level acquisition documentation.
  • Method developed by Italian researchers, acquired by IFPUG in 2019 (https://www.ifpug.org/ifpug-acquires-the-simple-function-points-method)
  • Can be performed quickly and early in a program’s lifecycle using existing documents.
  • Focuses on two elementary processes:

28

29 of 31

Simple Function Point Analysis�(Validated by a DHS Study referenced below)

  • 2022 study of 15 DHS IT systems and 3 DoD IT systems
  • “Based on the comparison of effort models, although all models passed the criteria for statistical significance, simple function points, unadjusted function points, and functional requirements are stronger predicters to development effort than stories, story points, or issues
  • Simple Function Points produced the highest adjusted R-squared value indicating a very strong predictive capability

“Lets Go Agile: Data-Driven Agile Software Costs and Schedule Models for DHS Projects”, ICEAA 2022, Wilson Rosa, Sara Jardine, Kimberly Roye, Kyle Eaton, and Chad Lucas

29

30 of 31

Functional Size Issues

  • Once the Function Point Count (FPC) is established, the count must be converted in hours
    • International Software Benchmarking Standards Group (ISBSG) offers a good commercial database
    • Currently no good Federal/National Security database (SRDR is beginning to collect this data)
    • Analogy to similar programs
    • Most Commercial models “backfire” the FPC into SLOC.
  • When the FPC is completed, some requirements will have a “0” FPC.   Items such as documentation or meeting a certain developmental standard do not require end user interaction and, as such, are not functional.  There is, however, effort associated with these requirements as they add complexity to the overall work effort
    • Most software cost estimating models, account for these hours from the parametric estimating equations derived for their historical data base.   
    • Non-model users might utilize tools like SNAP to account for these non-functional hours. 

30

31 of 31

Backfiring FP to SLOC

  • There are multiple data sources (Capers Jones, Galorath QSM, Unison) that offer backfiring tables.
  • Correct backfiring requires different metrics for each type of FP *
  • Below is a partial table from QSM

* Applied Software Measurement 3rd Edition,, Capers Jones, page 80

 

QSM SLOC/FP Data

Language

Avg

Median

Low

High

ABAP (SAP) *

28

18

16

60

ASP*

51

54

15

69

Assembler *

119

98

25

320

Brio +

14

14

13

16

C *

97

99

39

333

C++ *

50

53

25

80

C# *

54

59

29

70

COBOL *

61

55

23

297

31