1 of 12

Exploring the Link between Cognitive Abilities and Data Science Skills using Alternative Raven’s Progressive Matrices

Farshid Farzan, Hasan Mashrique, Andrew M. Olney

University of Memphis

July 20, 2025

2 of 12

Why is learning data science problem solving hard?

  • Learning data science problem solving involves complex relational reasoning

  • Individuals differ in relational processing capacity*

  • Examples of data science relational reasoning
    • Dataset transformation (selecting columns/rows; filtering)
    • Interpreting correlations/interactions
    • Plotting (structure sensitive mapping from variables to spatial relationships)

*Halford, G. S., Wilson, W. H., & Phillips, S. (1998). Processing capacity defined by relational complexity: Implications for comparative, developmental, and cognitive psychology. Behavioral and Brain Sciences, 21(6), 803–864. https://doi.org/10.1017/S0140525X98001769

3 of 12

Raven’s Matrices predict learning programming

Prat, C.S., Madhyastha, T.M., Mottarella, M.J. et al. Relating Natural Language Aptitude to Individual Differences in Learning Programming Languages. Sci Rep 10, 3817 (2020). https://doi.org/10.1038/s41598-020-60661-8

A. A. Farghaly and P. M. El-Kafrawy, "Exploring The Use Of Cognitive Tests To Predict Programming Performance: A Systematic Literature Review," 2021 31st International Conference on Computer Theory and Applications (ICCTA), Alexandria, Egypt, 2021, pp. 40-48, doi: 10.1109/ICCTA54562.2021.9916610.

4 of 12

Research questions

  • Does Raven’s Progressive Matrices (aRPM) predict data science problem solving?

  • What is the predictive value of aRPM after adjusting for experience in related fields?

  • Are aRPM predictions fair across demographic groups?

5 of 12

Method

  • Design
    • 2x2x2 factorial design (blocks/code, +/- self explanations, +/- goal labels)
    • Collapsed across factors due to attrition
  • Participants (N=31)
    • Psychology undergraduate students from subject pool
  • Materials
    • Five Jupyter notebooks
      • Partial worked example followed by problem solving with increasing transfer
    • Alternative Raven’s Progressive Matrices (aRPM)
      • 18 problems (next slide)
    • Data science problem solving
      • 7 problems (next slide)
    • Experience questions
      • Number of years experience in statistics, programming, and data science

6 of 12

Alternative Ravens Progressive Matrices*

*https://github.com/expfactory/expfactory-experiments/tree/master/ravens

Data Science Problem Solving (DSPS)

7 of 12

Results

  • RQ1: aRPM predicts DSPS
    • t(29)=4.18, p < .001, R2 = .38
    • r(29) = .61

  • Increase of 1 DSPS correct for every 3 aRPM questions correct

8 of 12

Results

  • RQ2: aRPM predicts DSPS beyond experience
    • t(26)=4.63, p < .001, R2 = .50

  • Experience increases R2 .12

  • 1 year of programming has same effect as 4 aRPM questions correct

9 of 12

Results

  • RQ3: aRPM may make unfair predictions across demographic groups

  • Underpredicts
    • Males
    • Non-white

  • Overpredicts
    • Females
    • White

10 of 12

Conclusion

  • Raven’s Progressive Matrices (free aRPM) predicts data science problem solving
    • Predicts 38% of the variance, correlation of .61
    • Remains predictive when experience included in model
    • May make unfair predictions across groups (more work needed)

  • aRPM
    • Is a useful covariate in education research for data science/programming
    • May be useful for screening learners who need extra time/support

11 of 12

Limitations

  • Analysis collapses across unbalanced factors
    • Treatment/differential attrition may have altered relationship between aRPM and DSPS

  • Small sample size
    • 31 participants
      • 21 Female --- 10 Male
      • 16 non-White --- 15 White

12 of 12

Questions?

Farshid Farzan, Hasan Mashrique, Andrew M. Olney

University of Memphis