1 of 28

Engagement Detection in e-learning Environments

Onur Copur

Matriola: 1891194

Advisor: Prof. Simone Scardapane Co-Advisor: Dr. Jürgen Slowack

2 of 28

Problem Definition

3 of 28

Introduction

With the Covid-19 outbreak, the online working and learning environments became essential in our lives.
For this reason, automatic analysis of non-verbal communication becomes crucial in online environments.
An innovative project by Barco is the Virtual Classroom aiming to improve the learning experience and increase efficiency in corporate meetings.

4 of 28

Engagement Detection

Predicting the engagement level of students.
The engagement level feedback is important because:

Make aware students about their performance in classes.
Will help instructors to detect confusing or unclear parts of the teaching material.

We propose an end-to-end deep learning based system which detects the engagement level of the subject in an e-learning environment.

5 of 28

Previous work

State of the Art

6 of 28

Classification Task

Deep Facial Spatiotemporal Network (DFSTN)

SE-ResNet-50 (SENet), LSTM, attention mechanism

Abedi et al. Resnet and TCN Hybrid Network

Resnet, TCN

Deep Engagement Recognition Network (DERN)

OpenFace Features, temporal convolution, bidirectional LSTM, attention mechanism.

7 of 28

Regression Task

Huynh et al.

OpenFace, Resnet50, LSTM

Wu et al.

OpenFace, OpenPose, Future aggregation, Bi-LSTM/GRU

9 of 28

Daisee Dataset (Classification Task)

70 students, 5478 clips each 10 seconds long.
Clip labels: Engagement, Boredom, Confusion and Frustration.
Each label has a score from 0-3.
Challenges:

Labels are not reliable (crowd source annotated and engagement is subjective)
Imbalanced Samples

10 of 28

Boredom low to high

Confusion low to high

Frustration low to high

Engagement low to high

11 of 28

Survey

To test the reliability of the labels, we created a survey with 15 participants.
Selected 60 random samples from the training dataset.
Number of samples for each engagement label 0,1,2,3 are 16,14,17,13 respectively.
Ask the participants to label the samples.
People avoid going for extreme labels
The human accuracy is 35% (very low).
Regression can be more suitable than classification

12 of 28

Emotiw Dataset (Regression task)

78 students, 195 videos, each 5 min.
Only Engagement labels.
Label score between 0-1.
Challenges:

Very few samples.

13 of 28

Model Design

14 of 28

Model Architecture

Input Video

OpenFace Feature

Matrix

[mxn]

Video segments

Aggregated Feature

Matrix

[axb]

BI-LSTM/BI-GRU

FCN

OpenFace Feature

Matrix

[mxn]

Aggregated Feature

Matrix

[axb]

BI-LSTM/BI-GRU

FCN

Engagement Level

15 of 28

OpenFace Features

The eye gaze related features:

gaze_0_x, gaze_0_y,gaze_0_z which are eye gaze direction vector in world coordinates for left eye.
gaze_1_x, gaze_1_y, gaze_1_zfor right eye.

The head pose and rotation related features:

pose_Tx, pose_Ty, pose_Tz representing the location of the head with respect to camera in millimeters.
pose_Rx, pose_Ry, pose_Rzindicates the rotation of the head in radians around x,y,z axes.

Facial Action Unit Intensities:

AU01_r, AU02_r, AU04_r, AU05_r, AU06_r, AU07_r, AU09_r, AU10_r,AU12_r, AU14_r, AU15_r, AU17_r, AU20_r, AU23_r, AU25_r, AU26_r, AU45_r.

16 of 28

Feature Aggregation & Bi-LSTM

17 of 28

Experiments & Results

18 of 28

DAISEE DATASET EXPERIMENTS

LSTM parameters:

number of hidden units: 256
number of layers: 2

MLP parameters:

num neurons 1st layer: 128
num neurons 2nd layer: 32
num neurons 3rd layer: 4

Training Parameters:

Batch size: 64
Learning rate: 0.0005
Number of epochs: 30
Dropout probability : 0.2

Training Procedure:

Train with Boredom labels.
Fine-tune with Engagement labels.

19 of 28

Test Data Performance 47%

Survey Data Performance 28%

20 of 28

EMOTIW DATASET EXPERIMENTS

LSTM parameters:

number of hidden units: 512
number of layers: 2

MLP parameters:

num neurons 1st layer: 128
num neurons 2nd layer: 32
num neurons 3rd layer: 4

Training Parameters:

Batch size: 8
Learning rate: 0.0005
Number of epochs: 350
Dropout probability : 0.2

21 of 28

Feature Importance

The figure shows the sum of gradients on the path from a zero baseline to a zero labeled sample from the test set.
The table represents the 5 most important features for all engagement label groups

22 of 28

Real-Life Performance (Very High Engagement)

23 of 28

Real-Life Performance (High Engagement)

24 of 28

Real-Life Performance (Low Engagement)

25 of 28

Real-Life Performance (Very Low Engagement)

26 of 28

Conclusion

Future Work

27 of 28

We proposed an end-to-end deep learning-based system that detects the engagement level of the subject.
The model is able to distinguish between different levels of engagement.
Some possible directions to extend and improve this work;

A training procedure that will use both datasets.
Learnable aggregation functions.
A self-supervised based method to avoid reliability problem of the labels.

1 of 28

2 of 28

3 of 28

4 of 28

5 of 28

6 of 28

7 of 28

8 of 28

9 of 28

10 of 28

11 of 28

12 of 28

13 of 28

14 of 28

15 of 28

16 of 28

17 of 28

18 of 28

19 of 28

20 of 28

21 of 28

22 of 28

23 of 28

24 of 28

25 of 28

26 of 28

27 of 28

28 of 28