1 of 28

Colorado Haiku Club: Analysis Using Natural Language Tools with Software Functional Descriptions

First International Boehm Forum on COCOMO and Systems and Software Cost Modeling

By: Dan Strickland

Software Cost SME

Missile Defense Agency

Cost Estimating Directorate

November 9, 2022

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

2 of 28

Agenda

  • SRDR Functional Description
  • Natural Language Tools
  • Functional Description Word Cloud and Vocabulary
  • Vocabulary Word Weights
  • Solving for Weights, Significant Weights, and Weight Distance
  • Brute Force Algorithm
  • Word Weights in Super-Domains
  • Bigram Modeling
  • Future Work

2

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

3 of 28

Haiku

  • haiku – a traditional Japanese poem consisting of three lines with 5, 7, and 5 syllables respectively; no rhyming restrictions
  • Examples:

3

“Give it a rest, Dan.

People don’t like using SLOC.”

“Then they are all wrong.”

My friend Dan Ligett

Benefits of intellect

Big trifecta win!

There is no island

Called COCOMO in real life

Just a catchy song

Effort equals a

Times KSLOC raised to the b

Times the EAF

My friend Dave Seaver

Worked Scanners into his brief

On a dare from Dan

Words have meaning and value – could words in software records have useful and quantifiable value?

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

4 of 28

SRDR – Functional Description

  • Software Resources Data Reports (SRDR) are the DoD’s mechanism for collecting data on software projects for cost analysis
  • SRDRs are collected by the Office of the Secretary of Defense (OSD) – Cost Assessment and Program Evaluation (CAPE) Organization from government contractors at the beginning and end of software projects
  • SRDRs contain data like size in Source Lines of Code (SLOC), contract type, hours expended per development phase, and application type of the software
  • Many papers and research projects on Cost Estimating Relationships (CERs) and data analysis of the SRDR databases, but most of them focus on numerical independent variables (SLOC, Requirements, Peak Staffing, etc.)
  • Functional Description – text field that addresses what the computer program is, what it does, and how it interfaces with other elements (both internal and external to the effort)
  • Functional Description is top-level free-flowing text that can vary in length and verbosity

4

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

5 of 28

Natural Language Tools

  • Natural Language is any language that has evolved naturally through repetition and planning (e.g. Klingon)
  • Word clouds are a visual representation of text data used to visualize free form text for importance and frequency
  • Word clouds can be expressed in frequency of the text field as a weighted list
  • Bag-of-Words – representation of text describing the occurrence of words in a document
    • Vocabulary of known words
    • Measure of the presence of known words
  • Example: “It was the best of times, it was the worst of times.”
    • Vocabulary of six words {it, was, the, best, of, times, worst}
    • Sample vector for “It was the best of times” = [1, 1, 1, 1, 1, 1, 0]
  • Vocabularies can be controlled and reduced for effectiveness
    • Remove conjunctions and articles (“a”, “and”, “the”, “but”, etc.)
    • Ignore case and punctuation
    • Fix misspelled words
    • Reduce words to their stem (“operations” and “operator” become “operat”)

5

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

6 of 28

SRDR Functional Description – Ground Rules and Assumptions

  • Using the NAVAIR SRDR Datasheet, May 2022
  • Only using Final SRDRs with an Alternate Good quality (“Good,” “Good-Alteration,” “Good-Allocation,” “Good-Combined”) – 747 records
  • Remove records with null or N/A for Functional Description – 421 records
  • Word cloud using the remaining Functional Descriptions
  • Develop a Bag-of-Words vocabulary for the SRDR Functional Descriptions
  • Use the vocabulary statistics to try and predict the Productivity of software records

6

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

7 of 28

SRDR Functional Description – Word Cloud

7

  • Over 2500 words after removal of stop words (“a,” “of,” etc.)
  • Highest number of reoccurring words are “software,” “data,” “provides,” and “system”

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

8 of 28

SRDR Functional Description – Vocabulary

  • Removed specific program names
  • Removed all articles, conjunctions, and stop words
  • Removed specific acronyms (CSCI, SPCI, etc.)
  • Reduced words to stems
  • Limited to 96 top-occurring, descriptive words
  • Only counting binary participation – does the word appear in the Functional Description or not? Multiple appearances by the word in the text carries no extra weight

8

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

9 of 28

SRDR Functional Description – Vocabulary

9

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

10 of 28

Vocabulary Metrics

10

Freq – percentage of entries that contain the vocabulary word

Min – minimum Productivity in the range of entries containing the word

Max – maximum Productivity in the range of entries containing the word

Mean – average Productivity of entries containing the word

Median – midpoint value Productivity of entries containing the word

  • Productivity = ESLOC / Total SW Hours
  • ESLOC is the formula from MDA Software Estimation Handbook
  • ESLOC = New + (0.5) Modified + (0.3) Auto-Generated + (0.05) Reuse
  • Values lower than 1.0 indicate difficult or inefficient coding

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

11 of 28

Vocabulary Metrics – Most/Least Productive

11

Top 10 Most Productive Words (by Median)

Top 10 Least Productive Words (by Median)

  • Mean Productivity of the entire dataset is 1.57
  • Median Productivity of the entire dataset is 1.01
  • No outliers are removed, but that could be a future endeavor

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

12 of 28

Word Weights – Single Value

Assign a weight value to each vocabulary word to see if Productivity can be predicted by the product of the active words in the Functional Description

12

Base Case (all weights = 1.0; no impact)

MMRE

93%

PRED (25)

24%

Mean (weights = mean Productivity)

MMRE

271153%

PRED (25)

3%

Median (weights = median Productivity)

MMRE

95%

PRED (25)

25%

  • Goal is for MMRE <= 50% and PRED(25) >= 75%
  • Mean Productivity degrades almost completely as all the weight values are above 1.0 due to the right-skewness of the mean values – don’t use the mean
  • Median Productivity is almost the same as Base Case as the overall median Productivity is 1.01 – we have 50% records above/below 1.01

Is there a set of default weights for the vocabulary words that will actively improve prediction performance over the Base Case?

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

13 of 28

Solver Results

  • Weights for vocabulary words started at median Productivity values
  • MS Solver, Evolutionary for max 1000 iterations

13

Solver (Max PRED (25); 0 <= Weight <= 5

MMRE

116%

PRED (25)

34%

Median (weights = median Productivity)

MMRE

95%

PRED (25)

25%

Solver (Min MMRE; 0 <= Weight <= 5

MMRE

64%

PRED (25)

18%

Increased Prediction Level and Increased MMRE

Decreased MMR and Prediction Level

  • Solver found a set of weights that can either increase Prediction Level or decrease MMRE, but not both
  • Focus on increasing Prediction Level over decreasing MMRE – “miss less, but probably miss big”

S1

S2

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

14 of 28

Solver Results – Top/Bottom Ten

  • Change the vocabulary words to the Top Ten and Bottom Ten Productivity medians – most impactful words
  • Remove all weights except the Top/Bottom Ten and solve for those weights
  • Have to eliminate rows with none of the twenty influencing words in their Functional Description (421 records -> 220 records)

14

Top/Bottom var. all other weights = 1.0

MMRE

93%

PRED (25)

30%

Top/Bottom var. all other weights = Median

MMRE

87%

PRED (25)

26%

Top/Bottom var. remove N/A

MMRE

94%

PRED (25)

19%

S3

S3.1

S3.2

Limited vocabulary eliminates almost half of the available data

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

15 of 28

Word Weights – Sum of Weights

  • Focus on the sum of distances from the average rather than the product of weights
  • Use the word medians and their directed distance from the overall median to get a list of weight distances
  • Prediction is equal to overall median Productivity plus the sum of word weight distances
  • Example:
    • “aircraft” = +0.87, “ballistic” = -0.30, Overall Median Productivity = 1.01
    • Prediction for 2-word “aircraft”, “ballistic” = 1.01 + 0.87 + (-0.30) = 1.58
  • Duplicated effort to show directed distance of word means from overall mean

15

Sum Median (weights = Median + SUM medians)

MMRE

86%

PRED (25)

22%

Sum Mean (weights = Mean + SUM means)

MMRE

173%

PRED (25)

18%

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

16 of 28

Solver Results – Weight Distance

  • Weights for vocabulary words started at overall median Productivity value plus the sum of word weights
  • MS Solver, Evolutionary for max 1000 iterations, Range between -1 and 2

16

Solver (Max PRED (25); -1 <= Weight <= 2

MMRE

102%

PRED (25)

37%

Sum Median (weights = Median + SUM medians)

MMRE

86%

PRED (25)

22%

Solver (Min MMRE; 0 <= Weight <= 5

MMRE

64%

PRED (25)

23%

Decreased MMRE and slightly increased Prediction Level

S4

S5

Increased Prediction Level and Increased MMRE

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

17 of 28

Brute Force Algorithm

  • Finds a solution based on nested loop pattern in Visual Basic
  • User-defined number of trials (default 100)
  • Set Delta Value for the incremental change of the word weights (default 0.01)
  • Randomly traverses the vocabulary changing the value of the word weight by the Delta Value
    • If the PRED(25) score improves, keeps the new word weight
    • If the PRED(25) score decreases, moves back to previous word weight
    • If the PRED(25) score doesn’t change in 3 straight increments, keeps the new word weight and moves on
    • Increases and then decreases the values
  • Tends to get “stuck” after first set of improvements

17

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

18 of 28

Brute Force Results

18

Base Case (all weights = 1.0; no impact)

MMRE

93%

PRED (25)

24%

Median (weights = median Productivity)

MMRE

95%

PRED (25)

25%

Brute Force (100 trials, 0.01 Delta Val)

MMRE

84%

PRED (25)

31%

Brute Force (100 trials, 0.01 Delta Val)

MMRE

78%

PRED (25)

31%

  • There are 42 of 96 words at 1.0 – almost half are not significant
  • Range of Weights: (0.76 – 1.02)
  • Range of Weights: (-1.27 – 2.44)
  • Negative numbers make no logical sense in a product

S6

S7

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

19 of 28

Word Weights Product – Best Result

19

Starting from Median Productivities

Solver (Max PRED (25); 0 <= Weight <= 5

MMRE

116%

PRED (25)

34%

PRED (25)

34%

PRED (30)

38%

PRED (40)

46%

PRED (50)

53%

While the best effort did not approach the target of PRED (25) = 75%, the predicted vocabulary word weight product comes within 50% of actual Productivity 53% of the time – could be used in early estimates and cross-checks

 

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

20 of 28

Word Weights Sum – Best Result

20

PRED (25)

37%

PRED (30)

40%

PRED (40)

46%

PRED (50)

52%

PRED (25) best of all models, but Prediction Level starts to draw even in larger values of PRED (x)

 

Solver (Max PRED (25); -1 <= Weight <= 2

MMRE

102%

PRED (25)

37%

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

21 of 28

Word Weights Brute Force – Best Result

21

PRED (25)

31%

PRED (30)

33%

PRED (40)

42%

PRED (50)

51%

Brute Force solution reduces the vocabulary to 54 words, but limits the data that can participate (Functional Descriptions with none of the significant vocabulary words)

 

Brute Force (100 trials, 0.01 Delta Val)

MMRE

84%

PRED (25)

31%

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

22 of 28

Word Weights in Super-Domains

22

  • SRDRs can be sub-divided by Application Domains and Super-Domains
  • Super-domains are four groupings of Application Domains that have similar Productivities historically
    • Automated Information Systems (AIS) – Enterprise Information System, Enterprise Services, Custom AIS Services, Mission Planning
    • Engineering (ENG) – Test/Measurement/Diagnostic Equipment, Scientific & Simulation, Process Control, System Software
    • Mission Support (MS) – Software Tools, Training
    • Real-Time (RT) – Command & Control, Communications, Real Time Embedded, Vehicle Control, Vehicle Payload, Signal Processing, Microcode & Firmware
  • Mean or median Productivity by Super-Domain can be used as a baseline for early estimates (“On average, Real Time efforts are 0.83 ESLOC/hr”)

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

23 of 28

Word Weights in Super-Domains

23

Best Result Product word weights match or out-perform the mean and median Productivities by Super-Domain in Prediction Level at 25%...

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

24 of 28

Word Weights in Super-Domains

24

…but as Prediction Level is expanded, the mean and median values can “catch up” to the Best Case Product word weights within Super-Domains

Word weights outperform known Productivity means and medians when Super-Domain is unknown

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

25 of 28

Bag of Words Bigram Model

  • Using the vocabulary, the bigram model couples sequenced words and calculates statistics using those couplings
  • Example: “It was the best of times” becomes
    • “it was” “was the” “the best” “best of” “of times”
  • Sequence is important as “aircraft…software” is not the same as “aircraft software”
  • Processing tools for collecting a bag of bigrams model are not readily available, but could lead to stronger vocabulary
  • Vocabulary would still need to be controlled and reduced to increase effectiveness
  • Potential to use pairs of the present vocabulary to look for bigrams (“aircraft software”), but the 96-word vocabulary leads to over 9,000 potential pairs

25

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

26 of 28

Future Work

  • Expanded vocabularies to bag of bigrams or even bag of pairs model (“aircraft” AND “software”)
  • Changes in Brute Force Algorithm to give more range and less “settling” effect; multiple goals
  • Further work on the Sum of Weights model using directed distance
  • Solve for word weights using a random training set and apply to random test set
  • Apply Natural Language tools to other text fields in the SRDR
    • Agile: Feature Descriptions
    • Development Contractor: Pair of Functional Description and Contractor Name / Site

26

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

27 of 28

Conclusions

  • Natural language tools can be applied to the Functional Description of the SRDRs to develop a vocabulary describing the software effort
  • The words in a vocabulary can be assigned a weight based on known metrics like mean/median Productivity of the records with words from the vocabulary
  • Tools like Solver and Visual Basic coded algorithms can be used to adjust the vocabulary word weights to achieve a goal, like higher Prediction Level values for software Productivities
  • Some word weight solutions match or outperform their counterpart Super-Domain means and medians at certain Prediction Levels
  • Natural Language tools produce tangible and exciting results for software estimation

27

DOC – Cost Estimating Directorate

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED

28 of 28

Approved for Public Release

22-MDA-11295 (31 Oct 22)

UNCLASSIFIED