1 of 16

Riiid! Answer Correctness Prediction

Karmen Kink, Lisa Korotkova,

Taido Purason, Villem Tõnisson

2 of 16

Goal

In this competition, your challenge is to create algorithms for "Knowledge Tracing," the modeling of student knowledge over time. The goal is to accurately predict how students will perform on future interactions.

Data collected from Santa, an AI tutoring service (~780 000 students in South Korea) that prepares students for the TOEIC test

(link to paper)

3 of 16

Dataset

Training data columns

  • row_id
  • timestamp: the time in milliseconds between this user interaction and the first event completion from that user.
  • user_id
  • content_id: ID code for the user interaction
  • content_type_id: if the event was a question being posed to the user or the event was the user watching a lecture.
  • task_container_id: Id code for the batch of questions or lectures.
  • user_answer: the user's answer to the question, if any.
  • answered_correctly: if the user responded correctly.
  • prior_question_elapsed_time: The average time in milliseconds it took a user to answer each question in the previous question bundle
  • prior_question_had_explanation: Whether or not the user saw an explanation and the correct response(s) after answering the previous question bundle

4 of 16

Dataset

Question metadata

  • question_id: foreign key for the train/test content_id column, when the content type is question (0).
  • bundle_id: id for a bundle of questions (they are served together).
  • correct_answer: the answer to the question.
  • part: id of the relevant section of the TOEIC test.
  • tags: tag code(s) for the question.

Lecture metadata

  • lecture_id: foreign key for the train/test content_id column, when the content type is lecture (1).
  • part: id of the relevant section of the test
  • tag: tag code for the lecture.
  • type_of: brief description of the core purpose of the lecture
    • concept, solving question, intention, starter

5 of 16

Dataset

  • Train set size: over 101 million rows
  • Test set size: 2.5 million rows
  • The train/test data is complete, in the sense that there are no missing interactions in the union of train and test data.

  • The test data follows chronologically after the train data. The test iterations give interactions of users chronologically.
  • The hidden test set contains new users but not new questions.
  • You can only submit from Kaggle Notebooks
  • You must use their custom riiideducation Python module to submit the prediction.

6 of 16

Evaluation

Metric: AUC

Scores on the public leaderboard:

  • Bronze: 0.756 - 0.760
  • Silver: 0.760 - 0.777
  • Gold: 0.778 - ...
  • Best score: 0.790
  • Best public notebook score: 0.756 (LGBM)

7 of 16

EDA

8 of 16

EDA

  • Latest user timestamp distribution
  • Percentage of questions answered correctly by user vs. number of questions answered

9 of 16

EDA

Experienced users do better

<50 questions answered

≥50 questions answered

≥500 questions answered

10 of 16

EDA

  • Questions have tags
  • Some are easier, some are harder

11 of 16

EDA

prior_question_elapsed_time distribution

12 of 16

Previous solutions

  • LGBM

Added features: mean of target per user; mean of target per question, mean of whether the prior question included explanation (per user), how many times a question set has been seen by the same user on average, how many and which lectures a user has watched, ...

New or rarely seen questions in the test set were assigned global mean, questions that were known to be very easy or very hard were assigned values respectively

Hyperparameters: objective=binary, boosting=gbdt, max_bin=800, lr=0.0175, num_leaves=80, early_stopping_rounds=12

13 of 16

Previous solutions

  • CNN

Added features very similar to previous approach

Hyperparameters: optimizer=Adam, lr=0.01, loss=binary_crossentropy, metric=binary_accuracy, dropout=0.1

14 of 16

Previous solutions

  • FTRL (Follow The Regularized Leader)
  • Implementation by Datatable
  • Only 6 features (user_id, question_id, prior_question_elapsed_time, bundle_id, part, tags)
  • 90M train rows, the rest validation
  • 20 seconds of training
  • AUC 0.74 public, 0.72 validation

15 of 16

Difficulties

Submissions only from Kaggle kernels

  • CPU Notebook <= 9 hours run-time
  • GPU Notebook <= 9 hours run-time
  • TPU Notebook <= 3 hours run-time

Memory issues

Kaggle Time Series API

Competition module

Prediction in batches

16 of 16

Ideas

Train / validation split

Validation should include completely new and older users

Freely & publicly available external data is allowed, including pre-trained models

Feature engineering