Riiid! Answer Correctness Prediction
Karmen Kink, Lisa Korotkova,
Taido Purason, Villem Tõnisson
Goal
In this competition, your challenge is to create algorithms for "Knowledge Tracing," the modeling of student knowledge over time. The goal is to accurately predict how students will perform on future interactions.
Data collected from Santa, an AI tutoring service (~780 000 students in South Korea) that prepares students for the TOEIC test
Dataset
Training data columns
Dataset
Question metadata
Lecture metadata
Dataset
Evaluation
Metric: AUC
Scores on the public leaderboard:
EDA
EDA
EDA
Experienced users do better
<50 questions answered
≥50 questions answered
≥500 questions answered
EDA
EDA
prior_question_elapsed_time distribution
Previous solutions
Added features: mean of target per user; mean of target per question, mean of whether the prior question included explanation (per user), how many times a question set has been seen by the same user on average, how many and which lectures a user has watched, ...
New or rarely seen questions in the test set were assigned global mean, questions that were known to be very easy or very hard were assigned values respectively
Hyperparameters: objective=binary, boosting=gbdt, max_bin=800, lr=0.0175, num_leaves=80, early_stopping_rounds=12
Previous solutions
Added features very similar to previous approach
Hyperparameters: optimizer=Adam, lr=0.01, loss=binary_crossentropy, metric=binary_accuracy, dropout=0.1
Previous solutions
Difficulties
Submissions only from Kaggle kernels
Memory issues
Kaggle Time Series API
Competition module
Prediction in batches
Ideas
Train / validation split
Validation should include completely new and older users
Freely & publicly available external data is allowed, including pre-trained models
Feature engineering