1 of 28

Colorado Haiku Club: Analysis Using Natural Language Tools with Software Functional Descriptions

First International Boehm Forum on COCOMO and Systems and Software Cost Modeling

By: Dan Strickland

Software Cost SME

Missile Defense Agency

Cost Estimating Directorate

November 9, 2022

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

2 of 28

Agenda

SRDR Functional Description
Natural Language Tools
Functional Description Word Cloud and Vocabulary
Vocabulary Word Weights
Solving for Weights, Significant Weights, and Weight Distance
Brute Force Algorithm
Word Weights in Super-Domains
Bigram Modeling
Future Work

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

3 of 28

Haiku

haiku – a traditional Japanese poem consisting of three lines with 5, 7, and 5 syllables respectively; no rhyming restrictions
Examples:

“Give it a rest, Dan.

People don’t like using SLOC.”

“Then they are all wrong.”

My friend Dan Ligett

Benefits of intellect

Big trifecta win!

There is no island

Called COCOMO in real life

Just a catchy song

Effort equals a

Times KSLOC raised to the b

Times the EAF

My friend Dave Seaver

Worked Scanners into his brief

On a dare from Dan

Words have meaning and value – could words in software records have useful and quantifiable value?

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

4 of 28

SRDR – Functional Description

Software Resources Data Reports (SRDR) are the DoD’s mechanism for collecting data on software projects for cost analysis
SRDRs are collected by the Office of the Secretary of Defense (OSD) – Cost Assessment and Program Evaluation (CAPE) Organization from government contractors at the beginning and end of software projects
SRDRs contain data like size in Source Lines of Code (SLOC), contract type, hours expended per development phase, and application type of the software
Many papers and research projects on Cost Estimating Relationships (CERs) and data analysis of the SRDR databases, but most of them focus on numerical independent variables (SLOC, Requirements, Peak Staffing, etc.)
Functional Description – text field that addresses what the computer program is, what it does, and how it interfaces with other elements (both internal and external to the effort)
Functional Description is top-level free-flowing text that can vary in length and verbosity

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

5 of 28

Natural Language Tools

Natural Language is any language that has evolved naturally through repetition and planning (e.g. Klingon)
Word clouds are a visual representation of text data used to visualize free form text for importance and frequency
Word clouds can be expressed in frequency of the text field as a weighted list
Bag-of-Words – representation of text describing the occurrence of words in a document

Vocabulary of known words
Measure of the presence of known words

Example: “It was the best of times, it was the worst of times.”

Vocabulary of six words {it, was, the, best, of, times, worst}
Sample vector for “It was the best of times” = [1, 1, 1, 1, 1, 1, 0]

Vocabularies can be controlled and reduced for effectiveness

Remove conjunctions and articles (“a”, “and”, “the”, “but”, etc.)
Ignore case and punctuation
Fix misspelled words
Reduce words to their stem (“operations” and “operator” become “operat”)

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

6 of 28

SRDR Functional Description – Ground Rules and Assumptions

Using the NAVAIR SRDR Datasheet, May 2022
Only using Final SRDRs with an Alternate Good quality (“Good,” “Good-Alteration,” “Good-Allocation,” “Good-Combined”) – 747 records
Remove records with null or N/A for Functional Description – 421 records
Word cloud using the remaining Functional Descriptions
Develop a Bag-of-Words vocabulary for the SRDR Functional Descriptions
Use the vocabulary statistics to try and predict the Productivity of software records

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

7 of 28

SRDR Functional Description – Word Cloud

Over 2500 words after removal of stop words (“a,” “of,” etc.)
Highest number of reoccurring words are “software,” “data,” “provides,” and “system”

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

8 of 28

SRDR Functional Description – Vocabulary

Removed specific program names
Removed all articles, conjunctions, and stop words
Removed specific acronyms (CSCI, SPCI, etc.)
Reduced words to stems
Limited to 96 top-occurring, descriptive words
Only counting binary participation – does the word appear in the Functional Description or not? Multiple appearances by the word in the text carries no extra weight

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

9 of 28

SRDR Functional Description – Vocabulary

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

10 of 28

Vocabulary Metrics

Freq – percentage of entries that contain the vocabulary word

Min – minimum Productivity in the range of entries containing the word

Max – maximum Productivity in the range of entries containing the word

Mean – average Productivity of entries containing the word

Median – midpoint value Productivity of entries containing the word

Productivity = ESLOC / Total SW Hours
ESLOC is the formula from MDA Software Estimation Handbook
ESLOC = New + (0.5) Modified + (0.3) Auto-Generated + (0.05) Reuse
Values lower than 1.0 indicate difficult or inefficient coding

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

11 of 28

Vocabulary Metrics – Most/Least Productive

Top 10 Most Productive Words (by Median)

Top 10 Least Productive Words (by Median)

Mean Productivity of the entire dataset is 1.57
Median Productivity of the entire dataset is 1.01
No outliers are removed, but that could be a future endeavor

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

12 of 28

Word Weights – Single Value

Assign a weight value to each vocabulary word to see if Productivity can be predicted by the product of the active words in the Functional Description

Base Case (all weights = 1.0; no impact)
MMRE	93%
PRED (25)	24%

Mean (weights = mean Productivity)
MMRE	271153%
PRED (25)	3%

Median (weights = median Productivity)
MMRE	95%
PRED (25)	25%

Goal is for MMRE <= 50% and PRED(25) >= 75%
Mean Productivity degrades almost completely as all the weight values are above 1.0 due to the right-skewness of the mean values – don’t use the mean
Median Productivity is almost the same as Base Case as the overall median Productivity is 1.01 – we have 50% records above/below 1.01

Is there a set of default weights for the vocabulary words that will actively improve prediction performance over the Base Case?

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

13 of 28

Solver Results

Weights for vocabulary words started at median Productivity values
MS Solver, Evolutionary for max 1000 iterations

Solver (Max PRED (25); 0 <= Weight <= 5
MMRE	116%
PRED (25)	34%

Median (weights = median Productivity)
MMRE	95%
PRED (25)	25%

Solver (Min MMRE; 0 <= Weight <= 5
MMRE	64%
PRED (25)	18%

Increased Prediction Level and Increased MMRE

Decreased MMR and Prediction Level

Solver found a set of weights that can either increase Prediction Level or decrease MMRE, but not both
Focus on increasing Prediction Level over decreasing MMRE – “miss less, but probably miss big”

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

14 of 28

Solver Results – Top/Bottom Ten

Change the vocabulary words to the Top Ten and Bottom Ten Productivity medians – most impactful words
Remove all weights except the Top/Bottom Ten and solve for those weights
Have to eliminate rows with none of the twenty influencing words in their Functional Description (421 records -> 220 records)

Top/Bottom var. all other weights = 1.0
MMRE	93%
PRED (25)	30%

Top/Bottom var. all other weights = Median
MMRE	87%
PRED (25)	26%

Top/Bottom var. remove N/A
MMRE	94%
PRED (25)	19%

S3.1

S3.2

Limited vocabulary eliminates almost half of the available data

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

15 of 28

Word Weights – Sum of Weights

Focus on the sum of distances from the average rather than the product of weights
Use the word medians and their directed distance from the overall median to get a list of weight distances
Prediction is equal to overall median Productivity plus the sum of word weight distances
Example:

“aircraft” = +0.87, “ballistic” = -0.30, Overall Median Productivity = 1.01
Prediction for 2-word “aircraft”, “ballistic” = 1.01 + 0.87 + (-0.30) = 1.58

Duplicated effort to show directed distance of word means from overall mean

Sum Median (weights = Median + SUM medians)
MMRE	86%
PRED (25)	22%

Sum Mean (weights = Mean + SUM means)
MMRE	173%
PRED (25)	18%

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

16 of 28

Solver Results – Weight Distance

Weights for vocabulary words started at overall median Productivity value plus the sum of word weights
MS Solver, Evolutionary for max 1000 iterations, Range between -1 and 2

Solver (Max PRED (25); -1 <= Weight <= 2
MMRE	102%
PRED (25)	37%

Sum Median (weights = Median + SUM medians)
MMRE	86%
PRED (25)	22%

Solver (Min MMRE; 0 <= Weight <= 5
MMRE	64%
PRED (25)	23%

Decreased MMRE and slightly increased Prediction Level

Increased Prediction Level and Increased MMRE

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

17 of 28

Brute Force Algorithm

Finds a solution based on nested loop pattern in Visual Basic
User-defined number of trials (default 100)
Set Delta Value for the incremental change of the word weights (default 0.01)
Randomly traverses the vocabulary changing the value of the word weight by the Delta Value

If the PRED(25) score improves, keeps the new word weight
If the PRED(25) score decreases, moves back to previous word weight
If the PRED(25) score doesn’t change in 3 straight increments, keeps the new word weight and moves on
Increases and then decreases the values

Tends to get “stuck” after first set of improvements

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

18 of 28

Brute Force Results

Base Case (all weights = 1.0; no impact)
MMRE	93%
PRED (25)	24%

Median (weights = median Productivity)
MMRE	95%
PRED (25)	25%

Brute Force (100 trials, 0.01 Delta Val)
MMRE	84%
PRED (25)	31%

Brute Force (100 trials, 0.01 Delta Val)
MMRE	78%
PRED (25)	31%

There are 42 of 96 words at 1.0 – almost half are not significant
Range of Weights: (0.76 – 1.02)

Range of Weights: (-1.27 – 2.44)
Negative numbers make no logical sense in a product

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

19 of 28

Word Weights Product – Best Result

Starting from Median Productivities Solver (Max PRED (25); 0 <= Weight <= 5
MMRE	116%
PRED (25)	34%

PRED (25)	34%
PRED (30)	38%
PRED (40)	46%
PRED (50)	53%

While the best effort did not approach the target of PRED (25) = 75%, the predicted vocabulary word weight product comes within 50% of actual Productivity 53% of the time – could be used in early estimates and cross-checks

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

20 of 28

Word Weights Sum – Best Result

PRED (25)	37%
PRED (30)	40%
PRED (40)	46%
PRED (50)	52%

PRED (25) best of all models, but Prediction Level starts to draw even in larger values of PRED (x)

Solver (Max PRED (25); -1 <= Weight <= 2
MMRE	102%
PRED (25)	37%

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

21 of 28

Word Weights Brute Force – Best Result

PRED (25)	31%
PRED (30)	33%
PRED (40)	42%
PRED (50)	51%

Brute Force solution reduces the vocabulary to 54 words, but limits the data that can participate (Functional Descriptions with none of the significant vocabulary words)

Brute Force (100 trials, 0.01 Delta Val)
MMRE	84%
PRED (25)	31%

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

22 of 28

Word Weights in Super-Domains

SRDRs can be sub-divided by Application Domains and Super-Domains
Super-domains are four groupings of Application Domains that have similar Productivities historically

Automated Information Systems (AIS) – Enterprise Information System, Enterprise Services, Custom AIS Services, Mission Planning
Engineering (ENG) – Test/Measurement/Diagnostic Equipment, Scientific & Simulation, Process Control, System Software
Mission Support (MS) – Software Tools, Training
Real-Time (RT) – Command & Control, Communications, Real Time Embedded, Vehicle Control, Vehicle Payload, Signal Processing, Microcode & Firmware

Mean or median Productivity by Super-Domain can be used as a baseline for early estimates (“On average, Real Time efforts are 0.83 ESLOC/hr”)

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

23 of 28

Word Weights in Super-Domains

Best Result Product word weights match or out-perform the mean and median Productivities by Super-Domain in Prediction Level at 25%...

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

24 of 28

Word Weights in Super-Domains

…but as Prediction Level is expanded, the mean and median values can “catch up” to the Best Case Product word weights within Super-Domains

Word weights outperform known Productivity means and medians when Super-Domain is unknown

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

25 of 28

Bag of Words Bigram Model

Using the vocabulary, the bigram model couples sequenced words and calculates statistics using those couplings
Example: “It was the best of times” becomes

“it was” “was the” “the best” “best of” “of times”

Sequence is important as “aircraft…software” is not the same as “aircraft software”
Processing tools for collecting a bag of bigrams model are not readily available, but could lead to stronger vocabulary
Vocabulary would still need to be controlled and reduced to increase effectiveness
Potential to use pairs of the present vocabulary to look for bigrams (“aircraft software”), but the 96-word vocabulary leads to over 9,000 potential pairs

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

26 of 28

Future Work

Expanded vocabularies to bag of bigrams or even bag of pairs model (“aircraft” AND “software”)
Changes in Brute Force Algorithm to give more range and less “settling” effect; multiple goals
Further work on the Sum of Weights model using directed distance
Solve for word weights using a random training set and apply to random test set
Apply Natural Language tools to other text fields in the SRDR

Agile: Feature Descriptions
Development Contractor: Pair of Functional Description and Contractor Name / Site

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

27 of 28

Conclusions

Natural language tools can be applied to the Functional Description of the SRDRs to develop a vocabulary describing the software effort
The words in a vocabulary can be assigned a weight based on known metrics like mean/median Productivity of the records with words from the vocabulary
Tools like Solver and Visual Basic coded algorithms can be used to adjust the vocabulary word weights to achieve a goal, like higher Prediction Level values for software Productivities
Some word weight solutions match or outperform their counterpart Super-Domain means and medians at certain Prediction Levels
Natural Language tools produce tangible and exciting results for software estimation

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

1 of 28

2 of 28

3 of 28

4 of 28

5 of 28

6 of 28

7 of 28

8 of 28

9 of 28

10 of 28

11 of 28

12 of 28

13 of 28

14 of 28

15 of 28

16 of 28

17 of 28

18 of 28

19 of 28

20 of 28

21 of 28

22 of 28

23 of 28

24 of 28

25 of 28

26 of 28

27 of 28

28 of 28