ASCOT 3: NONLINEAR PRINCIPAL COMPONENTS ANALYSIS AND UNCERTAINTY QUANTIFICATION IN EARLY LIFECYCLE SPACECRAFT FLIGHT SOFTWARE COST ESTIMATION
FIRST INTERNATIONAL BOEHM FORUM ON COCOMO AND SYSTEMS AND SOFTWARE COST MODELING
NOVEMBER 9-10, 2022
Sam Fleischer, PhD, Samuel.R.Fleischer@jpl.nasa.gov*
Patrick Bjornstad, Patrick.T.Bjornstad@jpl.nasa.gov
Jairus Hihn, PhD, Jairus.M.Hihn@jpl.nasa.gov
NASA Jet Propulsion Laboratory
California Institute of Technology
Pasadena, CA 91109
James Johnson, James.K.Johnson@nasa.gov*
National Aeronautics and Space Administration
Washington, DC 20546
*Corresponding Authors
OVERVIEW
CHALLENGES IN SPACECRAFT FLIGHT SOFTWARE COST ESTIMATION
WHY DOES ASCOT EXIST? (1/2)
WHY DOES ASCOT EXIST? (2/2) – THE DATASAURUS DOZEN
All of these datasets have identical statistics when rounded to the nearest 100th.
BAYESIAN REGRESSION AND IMPROVING OUR UNDERSTANDING OF UNCERTAINTY
BAYESIAN CER – POSTERIOR DISTRIBUTION
BAYESIAN CER – POSTERIOR PREDICTIVE DISTRIBUTION
credibility intervals
BAYESIAN CER – POSTERIOR PREDICTIVE DISTRIBUTION
credibility intervals
K-NEAREST-NEIGHBORS AND CLUSTERING ALGORITHMS – INPUT VARIABLES (1/3)
K-NEAREST-NEIGHBORS AND CLUSTERING ALGORITHMS – INPUT VARIABLES (2/3)
“Dual String – Warm” (backup maintaining continuous operations)
K-NEAREST-NEIGHBORS AND CLUSTERING ALGORITHMS – INPUT VARIABLES (3/3)
Numerical Variables
Number of Instruments
Number of Deployables
Nominal and Categorical Variables
Inheritance
Mission Size
Mission Type
Redundancy
Destination
How do you calculate the “distance” between missions with non-numerical data?
Is the “distance” between 2 instruments and 3 instruments equal to that of between 3 instruments and 4?
HOW DO YOU NUMERICIZE CATEGORICAL DATA?
NONLINEAR PRINCIPAL COMPONENTS ANALYSIS –�AUTO-ASSOCIATIVE NEURAL NETWORKS (ANN)
Auto-associative neural network
ANN parameters are optimized such that the difference between the output layer and the input layer is minimized.
Goal: the low-dimensional bottleneck layer must adequately retain the information contained in the input layer.
Result: A non-numeric input layer can be projected onto a numeric, low-dimensional space.
KNN ALGORITHM OVERVIEW
Your Project
KNN MODEL EXAMPLE OUTPUT
Effort (work-months)
Cumulative Effort Distribution
Probability of being one of the three nearest neighbors
Model Input:
Uncertainty in the NLPCA leads to uncertainty in the kNN result.
CLUSTERING ALGORITHM OVERVIEW
Probabilistic Linkage Matrices
Calculated using the k-Means algorithm in NLPCA space
(Cassini, Galileo, and Rovers and Landers are removed).
Effort Model Clusters | ||||||
1. Very Large, Old, Outer Planetary | 2. Rovers | 3. Landers | 4. Large, Complex,�Inner-Outer Planetary | 5. Large, Complex, Earth-Inner Planetary | 6. Smaller, Higher�Inheritance | 7. Large, Earth�Observatories and�Constellations |
Cassini | MER | Insight | Dawn | Deep Impact | DS1 | GRO |
Galileo | MPF | Phoenix | GRAIL | Genesis | GLORY | HST |
| MSL |
| JUNO | GPM Core | NuStar | MMS |
|
|
| Kepler | LRO | OCO-1 | SDO |
|
|
| LADEE | Mars Observer | WISE | Spitzer |
|
|
| MAVEN | Mars Odyssey |
|
|
|
|
| Messenger | OSIRIS-REx |
|
|
|
|
| MRO | SMAP |
|
|
|
|
| New Horizons | Stardust |
|
|
|
|
| Parker Solar Probe | STEREO |
|
|
|
|
|
| TIMED |
|
|
|
|
|
| Van Allen Probe |
|
|
SLOC Model Clusters | |||||||
1. Very Large, Old, Outer Planetary | 2. Rovers | 3. Landers | 4. Large, Complex, Inner-Outer Planetary | 5. Large, Moderately Complex, Dual String (Cold) | 6. Smaller or Simple, Earth – Asteroid/ Comet | 7. Small-Medium, Single-String Inner-Planetary or Dual String (Cold)�Asteroid/Comet
| 8. Large, Earth�Observatories and�Constellations |
Cassini | MER | Insight | JUNO | Deep Impact | DS1 | Contour | GLAST |
Galileo | MPF | Phoenix | Mars Observer | Genesis | EO1 | Dawn | GRO |
| MSL |
| MAVEN | GOES-R | GLORY | GRAIL | HST |
|
|
| Messenger | LDCM | GPM Core | LADEE | MMS |
|
|
| MRO | Mars Odyssey | IRIS | LCROSS | SDO |
|
|
| New Horizons | NPP | NuStar | LRO | Spitzer |
|
|
| Parker Solar Probe | OSIRIS-REx | OCO-1 |
| STEREO |
|
|
|
| Stardust | SMAP |
|
|
|
|
|
| Van Allen Probe | TIMED |
|
|
|
|
|
|
| WISE |
|
|
CLUSTERING ALGORITHM�OVERVIEW
Cluster 2 centroid
Your Project
Cluster 4 centroid
Cluster 3 centroid
Cluster 1 centroid
Cluster 5 centroid
CLUSTERING MODEL EXAMPLE OUTPUT
Model Input:
Uncertainty in the NLPCA leads to uncertainty in the cluster result.
Cluster Number
Probability of falling into the cluster
Effort (work-months)
Cumulative Effort Distribution
Cluster 6 (Smaller, Higher�Inheritance) |
DS1 |
GLORY |
NuStar |
OCO-1 |
WISE |
Uncertainty in the Effort distribution is caused by uncertainty in the NLPCA as well as uncertainty in the cluster.
THANKS!
We love to chat about collecting and cleaning data, statistics and machine learning, and software costing.
©2023. All rights reserved. Government sponsorship acknowledged. NASA HQ OCFO Strategic Investments Division provides the funding and management for the development of the ASCoT model and the ONSET framework. The cost information contained in this document is of a budgetary and planning nature and is intended for informational purposes only. It does not constitute a commitment on the part of JPL and/or Caltech.
BACKUP
BAYESIAN SIMPLE LINEAR REGRESSION USING THE R PACKAGE�BRMS (BAYESIAN REGRESSION MODELS USING STAN) (1/2)
slope <- 1.9
intercept <- 0.4
sigma <- 1.3
N <- 20
xs <- runif(N, min=-3, max=3)
signal <- slope*xs + intercept
noise <- rnorm(N, mean=0, sd=sigma)
ys <- signal + noise
plot(xs, ys)
Set the parameters of the model.
Simulate the process.
Plot the data.
Plot the data.
Note that even with known parameters, there is noise in the data. This noise is due to the inherent uncertainty in the process. This is called aleatoric uncertainty.
BAYESIAN SIMPLE LINEAR REGRESSION USING THE R PACKAGE�BRMS (BAYESIAN REGRESSION MODELS USING STAN) (2/2)
library(brms)
d <- data.frame(x=xs, y=ys)
model <- brm(y~x, data=d)
plot(model)
plot(conditional_effects(
model, method=‘predict’),
points=TRUE)
post <- as_draws_df(model)
head(post)
b_Intercept b_x sigma lprior lp__
1 -0.1447024 1.711225 1.3742691 -3.865604 -35.77348
2 0.8070629 1.549148 1.1941961 -3.864858 -35.20632
3 0.6598251 1.584357 1.1770281 -3.852215 -34.26019
4 0.1113087 1.498301 1.2273181 -3.841217 -35.30354
5 0.3810465 1.884662 0.8651947 -3.803884 -35.22898
6 0.5099172 1.653497 1.4078682 -3.876983 -34.48557
Load the BRMS library.
Define and fit the model.
See how the model looks over the data.
See the fitted parameters
See the fitted parameters
See how the model looks over the data.
Sample from the posterior.
There is uncertainty in the fitted parameters. This is called epistemic uncertainty and represents a lack of knowledge.