1 of 15

Predicting Success in Training of Navy Aviators

Neil C. Rowe and Arijit Das

Computer Science Department�U.S. Naval Postgraduate School

http://faculty.nps.edu/ncrowe

1

2 of 15

The challenge of Navy aviation training

  • Navy training of pilots and flight officers (“aviators”) is expensive:
    • The equipment needed is expensive.
    • Training is time-consuming since equipment is complex.
    • When someone leaves the training program, considerable money has been wasted.
  • Prediction of training performance is difficult:
    • Hundreds of factors have been tested, but no factor is dominant.
    • Curricula based on the type of aircraft and the trainee role. Different curricula get different tests, so it is hard to compare candidates.
    • Skills decay is significant with aviation training.

2

3 of 15

Data used in our study

  • 143 Excel tables from U.S. Navy Fleet Forces Command, with many entries blank
  • 18,596 training candidates, both pilots and flight officers
  • 301 metrics on the candidates:
      • Standardized tests given early in the training
      • Averaged tests on coursework for particular curricula
      • Averaged flight tests for particular curricula
      • Background data like race, organization, and flight school
      • Counts on the number of non-null attribute values for each phase
    • Pilot names and personal information were excluded from the data.

3

4 of 15

Cleaning the data

  • Data preponderantly numeric was converted into numbers.
  • Data that was preponderantly nonnumeric was all converted into symbol values.
  • Where reasonable, nonnumeric sortable values were converted to numbers. For instance, “Complete”, “Pass”, “Incomplete”, “Conditional Pass”, and “Pass” were converted to 1, 1, 0.5, 0.5, and 0.
  • Dates were converted to epoch time integers (seconds since January 1, 1970 at midnight) to be easier compare.
  • Null values were frequent in the data:
      • The empty string, a string consisting of a single space, “N/A”, “#N/A”, “NONE”, and “NULL” were all standardized as “null”.
      • 4804 null values for candidate ID codes occurred from 2000 to 2010; they were replaced with consecutive negative numbers since the rest of their data had significant information.
      • Null values for numeric attributes generally meant missing data, so we excluded them from averages.
      • Nulls for nonnumeric attributes were generally important, such as a null for previous flight training meaning the candidate had none.

4

5 of 15

Joining the data

  • Missing and redundant data created obstacles to prediction of candidate performance.
  • Fix 1: We averaged all scores for a candidate on the same test metric.
  • Fix 2: We averaged those averages for a candidate within a curriculum, dividing by the difficulty level on flight tests.
  • Fix 3: We did “outer joins” instead of the usual “inner joins”. That meant we entered nulls for attribute values that were missing for candidates.

5

6 of 15

Respecting attribute order

  • Each attribute is associated with one of 11 phases of training:
      • PRE: Initial data before the candidate begins training
      • ASTB, Aviation Selection Test Battery: Initial testing
      • IFS, Initial Flight Screening: Initial flight school
      • API, Aviation Preflight Indoctrination: Classroom instruction
      • PRI, PRI1, PRI2, Primary Flight Training: In specialized curricula
      • INT, Intermediate Flight Training: In specialized curricula
      • ADV, ADVCORE, Advanced Flight Training: In specialized curricula
      • FRS: Fleet Replacement Training: Refresher training
  • Training follows the order:
  • PRE – ASTB – IFS – API – PRI – PRI1 -- PRI2 -- INT – ADVCORE -- ADV – FRS
  • It only makes sense to predict attribute values from those of previous phases.

6

7 of 15

Methods of predicting attribute values

  •  

7

8 of 15

Attributes measuring candidate success

After discussion with the sponsor, we identified 38 possible measures of success of a candidate:

  • Counts of retesting or test failure
  • Indications of disenrollment, inactive status, or “attriting”
  • Counts on data for later phases, suggesting the candidate was disenrolled if zero
  • Average academic and flight-test grades

8

9 of 15

Significant binary correlations with candidate success

9

Success-related attribute

Phase

Possible nonnull values

Nonnull occurs

Positive corrs.

Negative corrs.

 

NGCode

PRE

12 strings

1761

0

0

 

RetestStatus

ASTB

Never, 30Days, 90Days, 180Days, Never1992, Resume

8540

5

0

 

 

ExamineeStatus

ASTB

None

None

0

0

 

Number of ASTB1-5

ASTB

Floating point

14477

3

7

Number of ASTBE

ASTB

Floating point

4564

3

7

IFS_STATUS

IFS

Complete, Disenroll, Closing

13844

3

0

 

IFS_STATUS_NUM

IFS

Numbers 0.0 to 1.0

13834

5

0

 

IFS_DISENROLLMENT_DESCRIPTION

IFS

String

5787

5

13

 

IFS_USNA_PFP

IFS

String

634

0

17

 

IFS_ACAD_FAIL

IFS

Number 0.0 to 1.0

13844

0

9

 

IFS_FLT_FAIL

IFS

Number 0.0 to 1.0

13844

7

5

 

API_NSS

API

Integer

17401

17

2

 

API_Test_FAILS

API

Integer

17446

8

8

 

Count of nonnull values for the API phase

API

Floating point

18596

6

7

 

Pri (status in training)

PRI

G, UI, NG, AT, TG, UA

14461

4

2

 

Count of nonnull values for the PRI phase

PRI

Floating point

18596

16

1

 

PRI academic average

PRI

Floating point

13555

17

4

 

PRI flight average

PRI

Floating point

10664

19

3

 

Int (status in training)

INT

G, AT, UI, NG, MA, J

5530

1

1

 

Count of nonnull values for the INT phase

INT

Floating point

18596

26

9

 

INT academic average

INT

Floating point

8153

22

4

 

INT flight average

INT

Floating point

3301

7

11

 

Count of nonnull values for the ADV phase

ADV

Floating point

18,596

29

10

 

ADV academic average

ADV

Floating point

9712

21

4

 

Adv (status in training)

ADV

G, AT, UI, NG, UU, TG, SQ, UIT

16005

1

1

 

ADV flight average

ADV

Floating point

4593

33

9

 

SYL_ST (syllabus status)

ADV

Complete, Active, Attrite

6292

5

2

 

STAT_RESN (reason for syllabus status)

ADV

20 strings

260

0

0

 

NSS_UNSATS

ADV

Number

3012

1

0

 

OFFICIAL_ NMU

ADV

Number

3012

1

0

 

NUM_RRU

ADV

Number

3012

1

0

 

IPC

ADV

Number

3012

0

0

 

FPC

ADV

Number

3012

3

0

 

NSS

ADV

Floating point

5221

1

4

 

Count of nonnull values for the FRS phase

FRS

Floating point

18,596

37

13

 

FRS_TW6_Grade

FRS

Number

1274

29

8

 

FRS_TW6_Status

FRS

Number 0.0 to 1.0

7682

36

7

 

10 of 15

Observed correlations

  • The reliability of the correlations with success was hampered by the low rate of failure. For instance, IFS_STATUS recorded 13,460 candidates who completed and 374 who were “disenrolled”; SYL_ST had 5369 who completed and 260 who were “attrited”.
  • Correlations on later phases did not include candidates who left at earlier phases and who might have provided useful data.
  • There were some strong correlations of success with increasing dates, but these are likely spurious due to more complete data for recent candidates.
  • Though success correlated with the number of flight hours, it correlated negatively with “formal flight instruction hours”, which may indicate civilian training needed to be unlearned.
  • Female gender and minority race showed more failures early in training but fewer later in training.
  • Several ASTB test results correlated well with success in IFS, Primary, Intermediate, and FRS, but they were not as helpful in predicting success in Advanced training.
  • Several Primary, Intermediate and Advanced training grades correlated positively with both success in Advanced training and FRS. However, some of the advanced-training grades correlated negatively with success, and these should be investigated further.

10

11 of 15

Finding good attribute subsets for regressions

  • To find more quickly good sets of variables for regressions, a greedy algorithm can assess one variable at a time, deciding whether to include it; that can also consider the number of rows of data with values for the variable, a key issue with our data.
  • Additive methods build a regression model by successively adding the variable that improves the fit the most. We preferred an additive approach because of the many missing data values.
  • We should only add attributes that:
      • Are shown to be correlated with the target variable of the regression;
      • Where the number of data rows with nonnull data was less than twice the number of attributes;
      • Whose inclusion improves the regression fit by at least 0.1%.
  • We applied the greedy algorithm to pick subsets of these input variables, did linear regressions, and measured the standard error.

11

12 of 15

Best regression formulas found (1)

12

# candi-dates

Avg. error

Best formula

4144

0.0096

0.0242*PFAR_Z_ASTBE + 0.0048*API_NSS + 0.0171*ANIT_Z_ASTBE + 0.0098*ATTFactor_Z_ASTBE + 0.0063*Personality5_Z_ASTBE + -0.0074*Personality6_Z_ASTBE + -0.0107*OAR_Z_ASTBE + -0.0136*Number of ASTBE + 0.0057*Personality3_Z_ASTBE + -0.0102*API_Test_FAILS + 1.0659 = PRI_FLIGHT_AV

624

0.0038

-0.0123*ANIT_Z_ASTBE + 0.014*DLTFactor_Z_ASTBE + -0.0055*ATTFactor_Z_ASTBE + 0.0111*Number of ASTBE + -0.0008*API_NSS + -0.0032*VTTFactor_Z_ASTBE + 0.0069*API_Test_FAILS + 1.181 = INT_FLIGHT_AV

2880

0.0042

0.0017*API_NSS + -0.0069*IFS_TOTAL_FLIGHT_TIME + 0.0008*IFS_EOC + 0.0008*IFS_STG_1 + -0.0093*API_Test_FAILS + 0.9477 = ADV_FLIGHT_AV

2476

0.2081

-0.0088*Number of ASTBE + 0.0053*API_NSS + 0.0011*IFS_EOC + 0.0302*ANIT_Z_ASTBE + -0.0707*API_Test_FAILS + -0.029*IFS_TOTAL_FLIGHT_TIME + -0.0457*IFS_ACAD_FAIL + 0.0196*SkillFactor_Z_ASTBE + 5.0012 = PRI_COUNT

2694

0.9148

-0.0331*IFS_TOTAL_FLIGHT_TIME + 0.1961*PFAR_Z_ASTBE + 0.0939*ANIT_Z_ASTBE + 0.0479*Personality2_Z_ASTBE + 0.0432*Personality8_Z_ASTBE + 0.039*ATTFactor_Z_ASTBE + 1.6159 = INT_COUNT

13844

2.1403

-0.0626*IFS_TOTAL_FLIGHT_TIME + 0.4278*IFS_FLT_FAIL + 2.7704 = ADV_COUNT

12668

0.8013

-0.0937*IFS_TOTAL_FLIGHT_TIME + 0.0063*API_NSS + -0.1424*IFS_ACAD_FAIL + 2.0684 = FRS_COUNT

1254

588.4

-0.828*IFS_STG_2 + 0.7108*IFS_FAA + 7.4391*IFS_ACAD_FAIL + 0.9847*IFS_TOTAL_FLIGHT_TIME + -0.1748*IFS_STG_3 + 33.4863 = FRS_TW6_Grade

5386

0.0003

0.0*IFS_EOC + -0.0021*API_Test_FAILS + 0.0001*IFS_FAA + 0.0001*API_NSS + 0.9673 = FRS_TW6_Status

13 of 15

Best regression formulas found (2)

13

# cand-idates

Avg. error

Best formula

39

2.328

-0.301*PRI_166A_CH-1_Academics_ RAW_SCORE_DV + 0.423*PRI_166A_CH-2_Academics_ RAW_SCORE_DV - 2.904*PRI_166A_CH-2_GRADE + 89.820 = ADV_ACADEMIC_AV

857

8.636

30.087*PRI_166B_GRADE + 0.166*PRI_166B_Academics_ RAW_SCORE_DV + 37.455 = ADV_ACADEMIC_AV

14

2.157

-19.007*PRI_TW-5_166A_GRADE + 0.032*PRI_TW-5_166A_Academics_ RAW_SCORE_DV + 118.275 = ADV_ACADEMIC_AV

63

2.610

49.237*NFO_TW-6_155C_Primary_GRADE + 0.040*NFO_TW-6_155C_Primary_Academics_ RAW_SCORE_DV + 38.292 = ADV_ACADEMIC_AV

443

3.257

8.488*NFO_TW-6_162A_Pri2_GRADE + 0.282*NFO_TW-6_162A_Pri1_ Academics_ RAW_SCORE_DV + 61.117 = ADV_ACADEMIC_AV

870

7.185

17.809*NFO_TW-6_162_Pri1_GRADE + 0.043*NFO_TW-6_162_Pri1_Academics_ RAW_SCORE_DV + 72.201 = ADV_ACADEMIC_AV

16

.00083

0.23217*PRI_166A_CH-1_GRADE + -0.00062*PRI_166A_CH-1_Academics_ RAW_SCORE_DV + 0.80358 = ADV_FLIGHT_AV

650

.00012

-0.07254*PRI_166B_GRADE + 1.24738 = ADV_FLIGHT_AV

64

00000

0.14389*NFO_TW-6_155C_Primary_GRADE + 0.00017*NFO_TW-6_155C_Primary_Academics_ RAW_SCORE_DV + 0.84174 = ADV_FLIGHT_AV

871

.00012

0.29038*NFO_TW-6_162A_Pri1_GRADE + 0.0009*NFO_TW-6_162A_Pri1_Academics_ RAW_SCORE_DV + 0.60225 = ADV_FLIGHT_AV

626

.00007

0.46048*NFO_TW-6_162_Pri1_GRADE - 0.00003*NFO_TW-6_162_Pri2_Academics_ RAW_SCORE_DV + 0.52616 = ADV_FLIGHT_AV

14 of 15

A database design

14

15 of 15

Conclusions

  • We found quite a few factors helpful in predictions, some obvious and some not.
  • Some factors measured as significant such as previous flight training, gender, and race are not ones the Navy can control practically or legally.
  • Overall, we conclude that the Navy is doing a good job predicting performance of candidate candidates from their multistage testing.
  • Currently we are working with Global Technology Inc. in Atlanta to extend this work:
    • We will explore the approach of combining factors with a set-covering machine-learning algorithm to find the most useful set of factors.
    • We did not have enough data for neural networks, but they should definitely be tried.
    • Future work should try to obtain more complete data on the candidates, as we were unable to make many potentially useful comparisons.

15