1 of 21

Student Academic Performance Prediction -�Regularly and During a Pandemic

Ivan Chorbev, Vlatko Nikolovski, Dimitar Trajanov and Petre Lameski

University Ss Cyril and Methodius in Skopje, North Macedonia

Faculty of Computer Science and Engineering

Symposium ITEM 2022

Innovation on Teaching Mathematics at HEI: Experiences on Classroom

Tenerife, March 15th – 18th, 2022

2 of 21

Introduction

  • Motivation
    • Educational Data Mining
    • Learning analytics
    • Student Datasets
      • Grades
      • Courses
      • Students Feedback
  • Objectives
    • Identify Risk
    • Educational Performance
      • Students
      • Institutions
    • Evaluate Teaching Staff and Curricula
    • Patterns and Predictions
  • Related Work
    • Students Dropout
    • Objectivity in Students Feedback
    • Teachers and Teaching Methods
    • Students Performance

2/

3 of 21

Methodology

  • Dataset A: from Students Information System
    • Teachers, Students, Courses, Grades, Curricula
  • Dataset B: from Students Feedback
    • Anonymous, Courses, Grades, Comments
  • Courses – relationship between datasets A and B
    • NLP task to classify courses based on curricula similarities
  • Dataset A
    • EDM task to extract student's educational performance
  • Dataset B
    • Grading - EDM task to extract student's grading
    • Comments -NLP task to extract sentiment, objects of interest

3/

4 of 21

Data Mining Process

4/

5 of 21

Data Mining Process -> Course classification

  • Course classification based on curricula
    • Cnn trained over DBPedia

5/

6 of 21

Data Mining Process -> EDM results

  • Dataset A
    • 3 consecutive study years: 2015/2016 contains 24336 students; 2016/2017 contains 26254 and 2017/2018 contains 28880 students
    • Extract Students Educational Performance

6/

7 of 21

Data Mining Process -> LA results

  • Dataset B
    • 3 study years: 2015/2016 with 21387 evaluations; 2016/2017 with 20981 evaluations and 2017/2018 with 21144 evaluations
    • Calculate Average Grade of Students Feedback
  • NLP task to extract sentiment
    • Negative, neutral, positive
  • NLP task to extract relevant keywords
    • Teaching tools and methods recognition

7/

8 of 21

Data Mining Process -> LA results

  • Study year 2015/2016

8/

9 of 21

Data Mining Process -> LA results

  • Study year 2016/2017

9/

10 of 21

Data Mining Process -> LA results

  • Study year 2017/2018

10/

11 of 21

Results

  • Cross join EDM and LA models – Datasets A and B
    • One row of sample data

11/

12 of 21

Results

  • Final Prediction Model
    • The value of the sentiment defined as ”negative” in correlation with the average grade from evaluations of 8.89 (on a scale from 5-10), are indicating potential problems of students over teaching methods. This is somehow confirmed by the actual students‘ performance with overall average grade of 7.31 and success rate of 43%.

12/

13 of 21

Data Mining Process -> Data Description

  • Four consecutive semesters - two semesters before and two semesters after the pandemic
  • Demographic features
    • LivingProximityfeature provides information on the distance of the faculty and the living place of the student in a manner of near and far, considering value near if the student lives in the same community as the address of the faculty, or far otherwise.
    • LivingPlaceTypefeature describes if the student comes from an urban or rural place
    • PreviousGradeEctsfeature presents the GPA of the student from the previous education
    • PreviousEducationLanguagefeature gives information if the previous education of the student was in his native language or not
    • PreviousEducationTypefeature gives information about the type of the previous education of a student in a manner of: high school or university
    • HasChangedQuotaPricefeature gives an overview if a student has been downgraded to a higher payment quota, as a result of a bad academic performance
    • HasChangedStudyProgrammefeature gives and overview if a student has changed to another study program during studies
  • Academic performance features
    • NumCoursesBP
    • NumCoursesAP
    • NumExamsTakenBP
    • NumExamsTakenAP
    • NumExamsTakenBP
    • NumExamsTakenAP
    • SumAverageBP
    • SumAverageAP

13/

14 of 21

Data Mining Process -> Statistical analysis

  • Feature extraction
    • StudentRatioBefore
    • StudentRatioAfter
    • Age {Y, O}
    • ExamsNotTakenBefore- difference between the number of courses enrolled and number of exams taken in the period before the pandemic
    • ExamsNotTakenAfter- difference between the number of courses enrolled and number of exams taken in the period after the pandemic

14/

15 of 21

Data Mining Process -> Dataset features

15/

16 of 21

Data Mining Process -> DNN Model

  • DNN Model for prediction academic performance
    • Multiple-input regression
    • Target feature: StudentRatioAfter
  • Model layers
    • Normalization input layer
    • Two hidden, nonlinear, Dense layers using the relu nonlinearity
    • A linear single-output layer

16/

17 of 21

Results

  • Joint distribution of StudentRatioBefore and StudentRatioAfter

17/

18 of 21

Results

  • Model accuracy – 84%
    • Model accuracy (left), Prediction errors (right)

18/

19 of 21

Results

  • Explainable AI
    • Feature importance

19/

20 of 21

Conclusion

  • Prediction Model for Educational Performance
  • Risk Assessment
  • Evaluate Teaching tools and Methods
  • Improve Students Educational Performance
  • Improve Curricula

20/

21 of 21

Thank you for your attention

21/