Can Students Understand AI Decisions Based on Variables Extracted via AutoML?
Jackie Tang, Nigel Bosch
1
IEEE International Conference on Systems, Man, and Cybernetics
(Human + Machine) Learning Lab
BACKGROUND
2
So there's a catch: Can students understand these AI decisions?
Challenge: Ensuring AI decisions are understandable
Student facing dashboard
3
E-Learning
4
E-Learning System
Final Exam
Midterm Exam
Interaction Data
Modeling
Evaluation
Prediction
Click
Keyboard
View
Educational Data Mining
5
Data Processing
Feature Extraction
Prediction
ML algorithm
Example
6
Method | Feature type | Description | Example | Translation |
| Combined features | Created by multiplying two or more existing aggregated features. They capture the combined effect of multiple features | MIN(assessmentsmerged.PERCENTILE(score)) | The lowest score a student has achieved and ranks it by the number of related clicks compared to their peers |
Statistical aggregation features | Created by aggregating stat function | SUM(studentVle.date) | Adds up all the dates a student clicked on any course material | |
| Time series features | Extracts a large number of time series characteristics by applying various mathematical transformations and statistical functions to time series data, then calculating relevant properties of the resulting transformed series. | forumng__change_quantiles__f_agg_"var"__isabs_False__qh_0.6__ql_0.2 | Variance of the difference in the number of forum clicks, excluding the lowest 20% and the highest 40% of values |
| Expert features | Created by expert, identify and construct meaningful variables that capture important aspects of the problem domain | Score_higher_than_mean | Numbers of scores students received that higher than average of class |
Research Questions
7
Dataset
8
OULAD | EPM |
Open University, UK | Middle East Technical University, Turkey |
2013-2014 | 2012 |
Distance learning courses | Face-to-face learning processes |
~32,000 students | 115 students |
Student demographics, course info, VLE interactions | Student activities, resource usage, academic performance |
VLE activity, assessments | Student activities, resource accesses, time spent |
Large-scale online learning environment data | Detailed process-oriented data on learning behaviors |
Feature Engineering
9
Expert features
AUTOML features
Selection for survey use: Top 15 features via decision tree model
Feature Engineering
10
Data Preprocessing
Feature Generation
Feature Selection
Entity Setup
"students" and "vle"
Relationship Definition
based on the student_id
Primitive Definition
Aggregation primitive: "median"
Transform primitive: "sum
Feature Generation
it applies the "sum" transform to the "click" column in the "vle" entity. aggregates these sums using the "median" function for each student.
Segment Time Series
Tsfresh identifies the data points at the 20th and 60th percentiles of the time series.�It uses these points to divide the time series into segments.
Calculate Changes
Within each segment, tsfresh computes the changes between consecutive data points.�Since isabs=False, it uses the raw changes, not their absolute values.
Compute Variance
tsfresh calculates the variance of all the changes computed in step 2.�Variance measures how far the changes are spread out from their average.
MIN(assessmentsmerged.PERCENTILE(score))
forumng__change_quantiles__f_agg_"var"__isabs_False__qh_0.6__ql_0.2
Method
11
Participants:
intermediate AI/ML knowledge
Task Structure:
Machine Learning Models:
Dataset | Feature | Metrics | Result |
EPM | TSFRESH | R2 | 0.480 |
EPM | Expert | R2 | 0.443 |
OULAD | Featuretools | AUC | 0.812 |
OULAD | Expert | AUC | 0.789 |
Research procedure
Verification
Predictions Task
for each feature type
Consenting/ Introduction
Predictions Task
With ML Prediction Result
Rate
Interpretability
Pass
No Pass
Result Shown
Hidden value reveal
Make your own prediction
Research procedure
13
Research procedure
14
Verification
Predictions Task
for each feature type
Consenting/ Introduction
Predictions Task
With ML Prediction Result
Rate
Interpretability
Pass
No Pass
Result Shown
Hidden value reveal
Make your own prediction
Research procedure
15
Predictions Task
for each feature type
Consenting/ Introduction
Predictions Task
With ML Prediction Result
Rate
Interpretability
Pass
No Pass
Result Shown
Hidden value reveal
Make your own prediction
Research Procedure
16
Real result: PASS
Y/N
Verification
Match
Unmatched
Interpretability on each features
Note: We previously evaluated the algorithm on a large data set of student's data and the system is reliable to do the task with 75-85% accuracy.
Results - RQ1 Expert vs AutoML Interpretability
17
Results - RQ2: Data Type
18
Top 5 Interpretability by data type:
p < .001
Results - RQ2: Aggregation
19
Results - RQ2: Familiarity
20
Result - Key Takeaways
• Preference for score and timing data over interaction data
• Cumulative and proportional calculations most understandable
• Repeated exposure and lexical familiarity didn't significantly impact interpretability
21
Limitation and Future Work
Limitations:
Future Research:
22
“Questions?”
23
Contact:
Citation
24