1 of 28

Engagement Detection in e-learning Environments

Onur Copur

Matriola: 1891194

Advisor: Prof. Simone Scardapane Co-Advisor: Dr. Jürgen Slowack

2 of 28

Problem Definition

3 of 28

Introduction

  • With the Covid-19 outbreak, the online working and learning environments became essential in our lives.
  • For this reason, automatic analysis of non-verbal communication becomes crucial in online environments.
  • An innovative project by Barco is the Virtual Classroom aiming to improve the learning experience and increase efficiency in corporate meetings.

4 of 28

Engagement Detection

  • Predicting the engagement level of students.
  • The engagement level feedback is important because:
    • Make aware students about their performance in classes.
    • Will help instructors to detect confusing or unclear parts of the teaching material.
  • We propose an end-to-end deep learning based system which detects the engagement level of the subject in an e-learning environment.

5 of 28

Previous work

&

State of the Art

6 of 28

Classification Task

  • Deep Facial Spatiotemporal Network (DFSTN)
    • SE-ResNet-50 (SENet), LSTM, attention mechanism
  • Abedi et al. Resnet and TCN Hybrid Network
    • Resnet, TCN
  • Deep Engagement Recognition Network (DERN)
    • OpenFace Features, temporal convolution, bidirectional LSTM, attention mechanism.

7 of 28

Regression Task

  • Huynh et al.
    • OpenFace, Resnet50, LSTM
  • Wu et al.
    • OpenFace, OpenPose, Future aggregation, Bi-LSTM/GRU

8 of 28

Datasets

9 of 28

Daisee Dataset (Classification Task)

  • 70 students, 5478 clips each 10 seconds long.
  • Clip labels: Engagement, Boredom, Confusion and Frustration.
  • Each label has a score from 0-3.
  • Challenges:
    • Labels are not reliable (crowd source annotated and engagement is subjective)
    • Imbalanced Samples

10 of 28

Boredom low to high

Confusion low to high

Frustration low to high

Engagement low to high

11 of 28

Survey

  • To test the reliability of the labels, we created a survey with 15 participants.
  • Selected 60 random samples from the training dataset.
  • Number of samples for each engagement label 0,1,2,3 are 16,14,17,13 respectively.
  • Ask the participants to label the samples.
  • People avoid going for extreme labels
  • The human accuracy is 35% (very low).
  • Regression can be more suitable than classification

12 of 28

Emotiw Dataset (Regression task)

  • 78 students, 195 videos, each 5 min.
  • Only Engagement labels.
  • Label score between 0-1.
  • Challenges:
    • Very few samples.

13 of 28

Model Design

14 of 28

Model Architecture

Input Video

OpenFace Feature

Matrix

[mxn]

Video segments

Aggregated Feature

Matrix

[axb]

BI-LSTM/BI-GRU

FCN

OpenFace Feature

Matrix

[mxn]

Aggregated Feature

Matrix

[axb]

BI-LSTM/BI-GRU

FCN

Engagement Level

Engagement Level

15 of 28

OpenFace Features

  • The eye gaze related features:
    • gaze_0_x, gaze_0_y,gaze_0_z which are eye gaze direction vector in world coordinates for left eye.
    • gaze_1_x, gaze_1_y, gaze_1_zfor right eye.
  • The head pose and rotation related features:
    • pose_Tx, pose_Ty, pose_Tz representing the location of the head with respect to camera in millimeters.
    • pose_Rx, pose_Ry, pose_Rzindicates the rotation of the head in radians around x,y,z axes.
  • Facial Action Unit Intensities:
    • AU01_r, AU02_r, AU04_r, AU05_r, AU06_r, AU07_r, AU09_r, AU10_r,AU12_r, AU14_r, AU15_r, AU17_r, AU20_r, AU23_r, AU25_r, AU26_r, AU45_r.

16 of 28

Feature Aggregation & Bi-LSTM

17 of 28

Experiments & Results

18 of 28

DAISEE DATASET EXPERIMENTS

  • LSTM parameters:
    • number of hidden units: 256
    • number of layers: 2
  • MLP parameters:
    • num neurons 1st layer: 128
    • num neurons 2nd layer: 32
    • num neurons 3rd layer: 4
  • Training Parameters:
    • Batch size: 64
    • Learning rate: 0.0005
    • Number of epochs: 30
    • Dropout probability : 0.2
  • Training Procedure:
    • Train with Boredom labels.
    • Fine-tune with Engagement labels.

19 of 28

Test Data Performance 47%

Survey Data Performance 28%

20 of 28

EMOTIW DATASET EXPERIMENTS

  • LSTM parameters:
    • number of hidden units: 512
    • number of layers: 2
  • MLP parameters:
    • num neurons 1st layer: 128
    • num neurons 2nd layer: 32
    • num neurons 3rd layer: 4
  • Training Parameters:
    • Batch size: 8
    • Learning rate: 0.0005
    • Number of epochs: 350
    • Dropout probability : 0.2

21 of 28

Feature Importance

  • The figure shows the sum of gradients on the path from a zero baseline to a zero labeled sample from the test set.
  • The table represents the 5 most important features for all engagement label groups

22 of 28

Real-Life Performance (Very High Engagement)

23 of 28

Real-Life Performance (High Engagement)

24 of 28

Real-Life Performance (Low Engagement)

25 of 28

Real-Life Performance (Very Low Engagement)

26 of 28

Conclusion

&

Future Work

27 of 28

  • We proposed an end-to-end deep learning-based system that detects the engagement level of the subject.
  • The model is able to distinguish between different levels of engagement.
  • Some possible directions to extend and improve this work;
    • A training procedure that will use both datasets.
    • Learnable aggregation functions.
    • A self-supervised based method to avoid reliability problem of the labels.

28 of 28

Thank you for Listening