JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 15

Predicting Success in Training of Navy Aviators

Neil C. Rowe and Arijit Das

Computer Science Department�U.S. Naval Postgraduate School

http://faculty.nps.edu/ncrowe

2 of 15

The challenge of Navy aviation training

Navy training of pilots and flight officers (“aviators”) is expensive:

The equipment needed is expensive.
Training is time-consuming since equipment is complex.
When someone leaves the training program, considerable money has been wasted.

Prediction of training performance is difficult:

Hundreds of factors have been tested, but no factor is dominant.
Curricula based on the type of aircraft and the trainee role. Different curricula get different tests, so it is hard to compare candidates.
Skills decay is significant with aviation training.

3 of 15

Data used in our study

143 Excel tables from U.S. Navy Fleet Forces Command, with many entries blank
18,596 training candidates, both pilots and flight officers
301 metrics on the candidates:

Standardized tests given early in the training
Averaged tests on coursework for particular curricula
Averaged flight tests for particular curricula
Background data like race, organization, and flight school
Counts on the number of non-null attribute values for each phase

Pilot names and personal information were excluded from the data.

4 of 15

Cleaning the data

Data preponderantly numeric was converted into numbers.
Data that was preponderantly nonnumeric was all converted into symbol values.
Where reasonable, nonnumeric sortable values were converted to numbers. For instance, “Complete”, “Pass”, “Incomplete”, “Conditional Pass”, and “Pass” were converted to 1, 1, 0.5, 0.5, and 0.
Dates were converted to epoch time integers (seconds since January 1, 1970 at midnight) to be easier compare.
Null values were frequent in the data:

The empty string, a string consisting of a single space, “N/A”, “#N/A”, “NONE”, and “NULL” were all standardized as “null”.
4804 null values for candidate ID codes occurred from 2000 to 2010; they were replaced with consecutive negative numbers since the rest of their data had significant information.
Null values for numeric attributes generally meant missing data, so we excluded them from averages.
Nulls for nonnumeric attributes were generally important, such as a null for previous flight training meaning the candidate had none.

5 of 15

Joining the data

Missing and redundant data created obstacles to prediction of candidate performance.
Fix 1: We averaged all scores for a candidate on the same test metric.
Fix 2: We averaged those averages for a candidate within a curriculum, dividing by the difficulty level on flight tests.
Fix 3: We did “outer joins” instead of the usual “inner joins”. That meant we entered nulls for attribute values that were missing for candidates.

6 of 15

Respecting attribute order

Each attribute is associated with one of 11 phases of training:

PRE: Initial data before the candidate begins training
ASTB, Aviation Selection Test Battery: Initial testing
IFS, Initial Flight Screening: Initial flight school
API, Aviation Preflight Indoctrination: Classroom instruction
PRI, PRI1, PRI2, Primary Flight Training: In specialized curricula
INT, Intermediate Flight Training: In specialized curricula
ADV, ADVCORE, Advanced Flight Training: In specialized curricula
FRS: Fleet Replacement Training: Refresher training

Training follows the order:
PRE – ASTB – IFS – API – PRI – PRI1 -- PRI2 -- INT – ADVCORE -- ADV – FRS
It only makes sense to predict attribute values from those of previous phases.

7 of 15

Methods of predicting attribute values

8 of 15

Attributes measuring candidate success

After discussion with the sponsor, we identified 38 possible measures of success of a candidate:

Counts of retesting or test failure
Indications of disenrollment, inactive status, or “attriting”
Counts on data for later phases, suggesting the candidate was disenrolled if zero
Average academic and flight-test grades

9 of 15

Significant binary correlations with candidate success

Success-related attribute	Phase	Possible nonnull values	Nonnull occurs	Positive corrs.	Negative corrs.
NGCode	PRE	12 strings	1761	0	0
RetestStatus	ASTB	Never, 30Days, 90Days, 180Days, Never1992, Resume	8540	5	0
ExamineeStatus	ASTB	None	None	0	0
Number of ASTB1-5	ASTB	Floating point	14477	3	7
Number of ASTBE	ASTB	Floating point	4564	3	7
IFS_STATUS	IFS	Complete, Disenroll, Closing	13844	3	0
IFS_STATUS_NUM	IFS	Numbers 0.0 to 1.0	13834	5	0
IFS_DISENROLLMENT_DESCRIPTION	IFS	String	5787	5	13
IFS_USNA_PFP	IFS	String	634	0	17
IFS_ACAD_FAIL	IFS	Number 0.0 to 1.0	13844	0	9
IFS_FLT_FAIL	IFS	Number 0.0 to 1.0	13844	7	5
API_NSS	API	Integer	17401	17	2
API_Test_FAILS	API	Integer	17446	8	8
Count of nonnull values for the API phase	API	Floating point	18596	6	7
Pri (status in training)	PRI	G, UI, NG, AT, TG, UA	14461	4	2
Count of nonnull values for the PRI phase	PRI	Floating point	18596	16	1
PRI academic average	PRI	Floating point	13555	17	4
PRI flight average	PRI	Floating point	10664	19	3
Int (status in training)	INT	G, AT, UI, NG, MA, J	5530	1	1
Count of nonnull values for the INT phase	INT	Floating point	18596	26	9
INT academic average	INT	Floating point	8153	22	4
INT flight average	INT	Floating point	3301	7	11
Count of nonnull values for the ADV phase	ADV	Floating point	18,596	29	10
ADV academic average	ADV	Floating point	9712	21	4
Adv (status in training)	ADV	G, AT, UI, NG, UU, TG, SQ, UIT	16005	1	1
ADV flight average	ADV	Floating point	4593	33	9
SYL_ST (syllabus status)	ADV	Complete, Active, Attrite	6292	5	2
STAT_RESN (reason for syllabus status)	ADV	20 strings	260	0	0
NSS_UNSATS	ADV	Number	3012	1	0
OFFICIAL_ NMU	ADV	Number	3012	1	0
NUM_RRU	ADV	Number	3012	1	0
IPC	ADV	Number	3012	0	0
FPC	ADV	Number	3012	3	0
NSS	ADV	Floating point	5221	1	4
Count of nonnull values for the FRS phase	FRS	Floating point	18,596	37	13
FRS_TW6_Grade	FRS	Number	1274	29	8
FRS_TW6_Status	FRS	Number 0.0 to 1.0	7682	36	7

10 of 15

Observed correlations

The reliability of the correlations with success was hampered by the low rate of failure. For instance, IFS_STATUS recorded 13,460 candidates who completed and 374 who were “disenrolled”; SYL_ST had 5369 who completed and 260 who were “attrited”.
Correlations on later phases did not include candidates who left at earlier phases and who might have provided useful data.
There were some strong correlations of success with increasing dates, but these are likely spurious due to more complete data for recent candidates.
Though success correlated with the number of flight hours, it correlated negatively with “formal flight instruction hours”, which may indicate civilian training needed to be unlearned.
Female gender and minority race showed more failures early in training but fewer later in training.
Several ASTB test results correlated well with success in IFS, Primary, Intermediate, and FRS, but they were not as helpful in predicting success in Advanced training.
Several Primary, Intermediate and Advanced training grades correlated positively with both success in Advanced training and FRS. However, some of the advanced-training grades correlated negatively with success, and these should be investigated further.

11 of 15

Finding good attribute subsets for regressions

To find more quickly good sets of variables for regressions, a greedy algorithm can assess one variable at a time, deciding whether to include it; that can also consider the number of rows of data with values for the variable, a key issue with our data.
Additive methods build a regression model by successively adding the variable that improves the fit the most. We preferred an additive approach because of the many missing data values.
We should only add attributes that:

Are shown to be correlated with the target variable of the regression;
Where the number of data rows with nonnull data was less than twice the number of attributes;
Whose inclusion improves the regression fit by at least 0.1%.

We applied the greedy algorithm to pick subsets of these input variables, did linear regressions, and measured the standard error.

12 of 15

Best regression formulas found (1)

# candi-dates	Avg. error	Best formula
4144	0.0096	0.0242PFAR_Z_ASTBE + 0.0048API_NSS + 0.0171ANIT_Z_ASTBE + 0.0098ATTFactor_Z_ASTBE + 0.0063Personality5_Z_ASTBE + -0.0074Personality6_Z_ASTBE + -0.0107OAR_Z_ASTBE + -0.0136Number of ASTBE + 0.0057Personality3_Z_ASTBE + -0.0102API_Test_FAILS + 1.0659 = PRI_FLIGHT_AV
624	0.0038	-0.0123ANIT_Z_ASTBE + 0.014DLTFactor_Z_ASTBE + -0.0055ATTFactor_Z_ASTBE + 0.0111Number of ASTBE + -0.0008API_NSS + -0.0032VTTFactor_Z_ASTBE + 0.0069*API_Test_FAILS + 1.181 = INT_FLIGHT_AV
2880	0.0042	0.0017API_NSS + -0.0069IFS_TOTAL_FLIGHT_TIME + 0.0008IFS_EOC + 0.0008IFS_STG_1 + -0.0093*API_Test_FAILS + 0.9477 = ADV_FLIGHT_AV
2476	0.2081	-0.0088Number of ASTBE + 0.0053API_NSS + 0.0011IFS_EOC + 0.0302ANIT_Z_ASTBE + -0.0707API_Test_FAILS + -0.029IFS_TOTAL_FLIGHT_TIME + -0.0457IFS_ACAD_FAIL + 0.0196SkillFactor_Z_ASTBE + 5.0012 = PRI_COUNT
2694	0.9148	-0.0331IFS_TOTAL_FLIGHT_TIME + 0.1961PFAR_Z_ASTBE + 0.0939ANIT_Z_ASTBE + 0.0479Personality2_Z_ASTBE + 0.0432Personality8_Z_ASTBE + 0.039ATTFactor_Z_ASTBE + 1.6159 = INT_COUNT
13844	2.1403	-0.0626IFS_TOTAL_FLIGHT_TIME + 0.4278IFS_FLT_FAIL + 2.7704 = ADV_COUNT
12668	0.8013	-0.0937IFS_TOTAL_FLIGHT_TIME + 0.0063API_NSS + -0.1424*IFS_ACAD_FAIL + 2.0684 = FRS_COUNT
1254	588.4	-0.828IFS_STG_2 + 0.7108IFS_FAA + 7.4391IFS_ACAD_FAIL + 0.9847IFS_TOTAL_FLIGHT_TIME + -0.1748*IFS_STG_3 + 33.4863 = FRS_TW6_Grade
5386	0.0003	0.0IFS_EOC + -0.0021API_Test_FAILS + 0.0001IFS_FAA + 0.0001API_NSS + 0.9673 = FRS_TW6_Status

13 of 15

Best regression formulas found (2)

# cand-idates	Avg. error	Best formula
39	2.328	-0.301PRI_166A_CH-1_Academics_ RAW_SCORE_DV + 0.423PRI_166A_CH-2_Academics_ RAW_SCORE_DV - 2.904*PRI_166A_CH-2_GRADE + 89.820 = ADV_ACADEMIC_AV
857	8.636	30.087PRI_166B_GRADE + 0.166PRI_166B_Academics_ RAW_SCORE_DV + 37.455 = ADV_ACADEMIC_AV
14	2.157	-19.007PRI_TW-5_166A_GRADE + 0.032PRI_TW-5_166A_Academics_ RAW_SCORE_DV + 118.275 = ADV_ACADEMIC_AV
63	2.610	49.237NFO_TW-6_155C_Primary_GRADE + 0.040NFO_TW-6_155C_Primary_Academics_ RAW_SCORE_DV + 38.292 = ADV_ACADEMIC_AV
443	3.257	8.488NFO_TW-6_162A_Pri2_GRADE + 0.282NFO_TW-6_162A_Pri1_ Academics_ RAW_SCORE_DV + 61.117 = ADV_ACADEMIC_AV
870	7.185	17.809NFO_TW-6_162_Pri1_GRADE + 0.043NFO_TW-6_162_Pri1_Academics_ RAW_SCORE_DV + 72.201 = ADV_ACADEMIC_AV
16	.00083	0.23217PRI_166A_CH-1_GRADE + -0.00062PRI_166A_CH-1_Academics_ RAW_SCORE_DV + 0.80358 = ADV_FLIGHT_AV
650	.00012	-0.07254*PRI_166B_GRADE + 1.24738 = ADV_FLIGHT_AV
64	00000	0.14389NFO_TW-6_155C_Primary_GRADE + 0.00017NFO_TW-6_155C_Primary_Academics_ RAW_SCORE_DV + 0.84174 = ADV_FLIGHT_AV
871	.00012	0.29038NFO_TW-6_162A_Pri1_GRADE + 0.0009NFO_TW-6_162A_Pri1_Academics_ RAW_SCORE_DV + 0.60225 = ADV_FLIGHT_AV
626	.00007	0.46048NFO_TW-6_162_Pri1_GRADE - 0.00003NFO_TW-6_162_Pri2_Academics_ RAW_SCORE_DV + 0.52616 = ADV_FLIGHT_AV

14 of 15

A database design

15 of 15

Conclusions

We found quite a few factors helpful in predictions, some obvious and some not.
Some factors measured as significant such as previous flight training, gender, and race are not ones the Navy can control practically or legally.
Overall, we conclude that the Navy is doing a good job predicting performance of candidate candidates from their multistage testing.
Currently we are working with Global Technology Inc. in Atlanta to extend this work:

We will explore the approach of combining factors with a set-covering machine-learning algorithm to find the most useful set of factors.
We did not have enough data for neural networks, but they should definitely be tried.
Future work should try to obtain more complete data on the candidates, as we were unable to make many potentially useful comparisons.