1 of 159

Using Human Activity Recognition in Physical Rehabilitation Exercises in Real-time

Presented By : Moamen Zaher

Assoc. Prof. Ayman Ezzat Atia

Computer Science Department

Faculty of Computers & Artificial Intelligence

Helwan University

Dr. Amr Ghoneim

Computer Science Department

Faculty of Computers & Artificial Intelligence

Helwan University

Dr. Laila M. Abdelhamid

Information System Department

Faculty of Computers & Artificial Intelligence

Helwan University

This Presentation is Presentedto Department of Information Systems to Obtain Master Degree in Software Engineering

2 of 159

Agenda

  1. Introduction
  2. Background and Literature Review
  3. Methodology
  4. Approaches and Results
  5. Discussion
  6. Conclusion
  7. Q&A
  8. Acknowledgments
  9. References

3 of 159

Introduction

4 of 159

Introduction 1/2

4

Rehabilitation is a long process so we need to shorten hospital stays.

Rehabilitation is a set of interventions designed to optimize functioning and reduce disability in individuals with health conditions in interaction with their environment.

Wrong execution of the exercises can hinder injury and increase recovery time.

5 of 159

Introduction 2/2

  • To promote Home-based Rehabilitation

5

Different skeleton parts, angles and trajectories for different body joints are required

Data must be acquired using sensors

An AI model must be developed to classify correct & wrong execution of exercises

6 of 159

Motivation

  • 2.4 billion people are currently require rehabilitation.
  • 50% of people do not receive the rehabilitation.
  • Rehabilitation services are under funded and under valued, particularly in countries without strong health systems.
  • The number of skilled rehabilitation practitioners in low- and middle-income countries is <10 per 1 million.
  • The number of people over 60 years of age is predicted to double by 2050. Hence, more people are living with chronic diseases.

6

Map of leading health conditions requiring rehabilitation in each country, 2019

Map of leading health conditions requiring rehabilitation in each country, 2019

7 of 159

Problem Statement

  • Lack of prioritization, funding, policies and plans for rehabilitation at a national level.
  • Lack of available rehabilitation services outside urban areas, and long waiting times.
  • High out-of-pocket expenses and non-existent or inadequate means of funding.
  • Lack of trained rehabilitation professionals, with less than 10 skilled practitioners per 1 million population in many low- and middle-income settings.
  • Lack of resources, including assistive technology, equipment and consumables.
  • The need for more research and data on rehabilitation.
  • Ineffective and under-utilized referral pathways to rehabilitation.

7

8 of 159

Research Objective

Train more professionals

Increase productivity

9 of 159

Research Outcomes

Allow patients to practice exercises at home without the need to go the physio clinics.

Real-Time feedback for the patient whether he’s done the exercise correctly or not.

Cut-Down Cost of rehabilitation.

Allow doctors to monitor patients progress.

10 of 159

Background and Literature Review

11 of 159

Background

  • Human Activity Recogntition
    • Is the process of automatically determining human actions and behaviors from sensor data.
    • It has applications in areas like health monitoring, surveillance, sports analysis, and human-machine interaction.
    • HAR systems use sensors like cameras, ambient sensors, or wearables to observe and classify activities, which can range from simple repetitive actions to complex multi-step tasks.

Arshad, M. H., Bilal, M., & Gani, A. (2022). Human activity recognition: Review, taxonomy and open challenges. Sensors, 22(17), 6463. [2]

12 of 159

Related Work

12

Debnath, B., O’brien, M., Yamaguchi, M., & Behera, A. (2022). A review of computer vision-based approaches for physical rehabilitation and assessment. Multimedia Systems, 28(1), 209-239. [3]

Computer Vision-Based approaches in physiotherapy

Rehabilitation

Virtual

Skeleton-based

Non-skeleton based

Automated Assessment

Direct

Pure Vision-based

Multi-modal

Assessment

Comparison

Kinematics-based Modeling

Statistical Model

Stochastic Methods

Categorization

Rule-based

Statistical and Stochastic Algorithms-based

Scoring

Author Proposed

Clinical

13 of 159

Digital Rehabilitation

Direct Rehabilitation

Virtual Rehabilitation

14 of 159

Data Acquisition

  • 4 types of data modalities [4]
  • We proceed with skeleton data [5]

Tasnim, N., & Baek, J. H. (2023). Dynamic edge convolutional neural network for skeleton-based human action recognition. Sensors, 23(2), 778. [4]

Yue, R., Tian, Z., & Du, S. (2022). Action recognition based on RGB and skeleton data sets: A survey. Neurocomputing, 512, 287-306. [5]

15 of 159

Datasets

Author

Impairment

Details

Sensor/Data

SPHERE- Walking2015

Sit to stand

109 sequences, 10 individuals, restricted knee, hip, freezing

Kinect/ Kinect SDK, OpenNI SDK skeleton

Parkinson’s pose�estimation

PD, LID, UPDRS assessment tasks

526 sequence, PD, LID patients, 4 UPDRS assessment tasks

RGB Camera/CPM skeleton

AHA-3D

Senior lower body fitness

11 young, 10 elderly subjects, 4 exercises

Kinect/ RGB, depth, skeleton

UI-PRMD

Physical Rehabilitation Movement

10 rehabilitation exercises 10 healthy individuals

Vicon optical tracker, and a Kinect camera

KIMORE

Stroke, PD, back pain exercises

44 healthy, 34 patient subjects, 5 exercises 5 repetitions

Kinect/ RGB, depth, skeleton

UTD-MHAD

27 different actions

 8 subjects (4 females and 4 males). Each subject repeated each action 4 times.

one Kinect camera and one wearable inertial sensor 

16 of 159

Related Work 1/2 :

  • This research intended to classify different types of exercises by implementing spike train features into deep learning.
  • UI-PRMD dataset
  • This paper chose to adopt ResNet as their CNN model for classification.
  • Data has been parted into 100 frames.
  • The classification achieved 77%.

16

Rashid, F. A. N., Suriani, N. S., Mohd, M. N., Tomari, M. R., Zakaria, W. N. W., & Nazari, A. (2020). Deep convolutional network approach in spike train analysis of physiotherapy movements. In Advances in Electronics Engineering: Proceedings of the ICCEE 2019, Kuala Lumpur, Malaysia (pp. 159-170). Springer Singapore. [6]

17 of 159

  • Design a new deep learning model by integrating criss-cross attention and edge convolution to extract discriminative features from the skeleton sequence for action recognition.
  • UTD-MHAD and MSR-Action3D datasets
  • CNN
  • The proposed method achieved average accuracies of 99.53% and 95.64% respectively.

17

Tasnim, N., & Baek, J. H. (2023). Dynamic edge convolutional neural network for skeleton-based human action recognition. Sensors, 23(2), 778. [4]

Related Work 2/2 :

18 of 159

Methodology

19 of 159

Overview of the Methodology

20 of 159

UI-PRMD

  • University of Idaho - Physical Rehabilitation Movements dataset.
  • 10 exercises, 10 individuals repeated 10 times.
  • 20 classes and 2000 records.
  • Vicon optical tracker, and a Kinect camera
  • The data include the motion measurement for 22 joints
  • Text files

A. Vakanski, H.-P. Jun, D. Paul, and R. Baker, “A data set of human body movements for physical rehabilitation exercises,” Data (Basel), vol. 3, Mar. 2018. [7]

21 of 159

KIMORE

  • KInematic Assessment of MOvement and Clinical Scores for Remote Monitoring of Physical REhabilitation
  • RGB, and Kinect camera
  • The data include the motion measurement for 25 joints
  • CSV files

Capecci, M., Ceravolo, M. G., Ferracuti, F., Iarlori, S., Monteriu, A., Romeo, L., & Verdini, F. (2019). The kimore

dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation. 

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(7), 1436-1448. [8]

22 of 159

KIMORE

  • 78 Subjects, 385 records
    • 44 healthy
      • 17 expert
      • 27 not experts
    • 34 patients
  • 4 levels
    • Healthy
    • Parkinson
    • Stroke
    • Back Pain

23 of 159

Collected Datasets

  • Collected Dataset
    • 3 exercises, 1 individual repeated 7 times
      • Mini squat – Sit to stand – Straight leg raise
    • RGB camera
    • Video Files
    • Extracted 33 body joints

23

24 of 159

Research Directions

Approach 1:�Comparative Study of Machine Learning Algorithms.�

Approach 2:Case Study: A framework for assessing physical rehabilitation exercises.

Approach 3:Comparative Study between CNN and RNN algorithms on multiple datasets.�

Approach 4:�Transfer Learning and Model Fusion.�

25 of 159

Approaches and Results

26 of 159

Paper 1: Rehabilitation Monitoring and Assessment: A Comparative Analysis of Feature Engineering and Machine Learning Algorithms on the UI-PRMD and KIMORE Benchmark Datasets

�Under Review in Journal of Information and Telecommunication(Q2), Taylor & Francis.

27 of 159

Research Pipeline

28 of 159

UI-PRMD Dataset Representation

29 of 159

Data Processing

  • Kinect Camera extracts 22 body joints.
  • All 22 body joints were stored in a vector V.
  • These joints were then processed using statistical techniques of:
    • maximum, minimum, mean, standard deviation, and median.

30 of 159

Feature Ranking and Selection

  • Feature ranking is implemented using various methods, such as filter methods, wrapper methods.
  • This research employed six distinct filter methods (Relief, FCBF, X2, Gini Decrease, Information Gain, and Information Gain Ratio.
  • We select top 20 features.

31 of 159

Feature Ranking and Selection

32 of 159

Action Classifcation

Non-Ensembled

  • Logistic Regression (LR)
  • K Nearest Neighbors (KNN)
  • Naive Bayes (NB)
  • Ridge
  • Quadratic Discriminant Analysis (QDA)
  • Linear Discriminant Analysis (LDA)
  • Decision Tree
  • Support Vector Machine (SVM)

Ensembled Models

  • Random Forest (RF)
  • Ada Boost (ADA)
  • Gradient Boosting Classifier (GBC)
  • Extra Trees Classifier (ET)
  • Light Gradient Boosting Machin (lightGBM)

33 of 159

Experiments

  • The initial experiment was conducted to compare the state-of-art feature selection techniques to minimize the feature count and to choose the most effective feature ranking method.
  • The second experiment compared a variety of classification algorithms on the same dataset to ascertain the optimal combination of feature ranking techniques and machine learning classification algorithms.

34 of 159

Results : Ranking Methods

35 of 159

Results : Algorithms

36 of 159

Results

37 of 159

UI-PRMD Results

Classifier

X2

FCBF

ReliefF

Gini Decrease

Information Gain

Information Gain Ratio

Classifier Average

Ada Boost Classifer

94.04%

93.37%

86.20%

93.66%

93.60%

91.66%

92.10%

Decision Tree

99.00%

99.07%

91.12%

98.81%

98.81%

98.81%

98.81%

Extra Tree

98.37%

99.64%

99.94%

97.81%

97.56%

98.25%

98.60%

Gradient Boosting

98.87%

98.93%

98.50%

98.81%

98.81%

98.81%

98.80%

KNN

92.44%

94.01

99.88%

91.63%

91.63%

93.07%

93.80%

Light Gradient Boosting Machine

98.94%

99.36%

98.75%

99.06%

99.06%

99%

99%

Linear Discriminant Analysis

98.87%

68.05%

97%

97.62%

97.62%

97.56%

92.80%

Logistic Regression

90.75%

94.86%

79.12%

87.94%

87.94%

88.31%

88.20%

Naïve Bayes

96.94%

96.85%

95.25%

96.38%

96.38%

86.75%

94.80%

Quadratic Discriminant Analysis

66.18%

87.76%

18.44%

67.31%

68.25%

40.56%

58.10%

Random Forest

98.31%

99.57%

99.62%

82.24%

98.06%

98.06%

96%

Ridge

68.26%

77.71%

53.69%

59.81%

59.81%

59.94%

63.20%

Support Vector Machine

86.83%

95.25%

69.61%

85.25%

85.25%

83.82%

84.30%

Average Accuracy of Each Feature Ranking Technique

91.37%

92.65%

83.62%

88.95%

90.21%

87.28%

38 of 159

Classifier

X2

FCBF

ReliefF

Gini Decrease

Information Gain

Information Gain Ratio

Mean Classifier Accuracy

Ada Boost Classifier

54.95%

56.67%

53.38%

55.28%

56.11%

59.86%

56.04%

Decision Tree

67.45%

74.35%

68.56%

66.53%

72.82%

73.19%

70.48%

Extra Tree

76.81%

81.85%

74.63%

72.78%

75.14%

76.81%

76.34%

Gradient Boosting

74.95%

74.95%

68.80%

69.54%

74.03%

73.98%

72.71%

KNN

75.00%

71.30%

71.85%

68.52%

71.90%

73.15%

71.95%

Light Gradient Boosting Machine

73.89%

71.44%

73.89%

72.96%

76.90%

77.64%

75.45%

Linear Discriminant Analysis

58.89%

58.89%

61.94%

61.53%

57.78%

60.83%

59.98%

Logistic Regression

59.12%

59.12%

53.80%

63.19%

58.52%

56.48%

58.37%

Naïve Bayes

57.92%

57.92%

54.44%

58.84%

59.26%

54.68%

57.18%

Quadratic Discriminant Analysis

18.19%

18.19%

32.45%

74.91%

75.56%

18.19%

39.58%

Random Forest

76.53%

76.53%

72.22%

70.46%

75.65%

78.43%

74.97%

Ridge

56.94%

56.94%

55.42%

56.30%

58.75%

57.92%

57.05%

Support Vector Machine

57.55%

66.20%

58.50%

58.06%

62.04%

58.43%

60.13%

KIMORE Results

39 of 159

Discussion

  • The tests revealed that the ET, KNN, and RF algorithms are most effective when paired with the ReliefF.
  • LightGBMMachine, DT, Gradient Boosting, SVM, LR, Quadratic Discriminant Analysis, and Ridge algorithms were found to be most successful when combined with the FCBF
  • Although ReliefF demonstrated the highest top-1 accuracy on the UI-PRMD dataset when paired with Extra Tree, it exhibited challenges when integrated with other models.

40 of 159

Conclusion

  • Machine learning requires additional stages of feature extraction and selection.
  • Non-ensemble models showed high sensitivity depending on the feature ranking technique employed, underscoring the impact of feature selection on classification outcomes.
  • Ensemble models demonstrated robust performance across multiple datasets, exhibiting low sensitivity to feature ranking methods
  • The best overal combination is ReliefF-Extra tree scoring 99.94%.
  • LightGBM have the most consistent results across all ranking techniques with an average of 99%
  • FCBF works best with ensemble models.
  • FCBF is scored the highest average across all models 92.65% followed by X2 91.37%.

41 of 159

Paper 2: A Framework for Assessing Physical Rehabilitation Exercises

Zaher, M., Samir, A., Ghoneim, A., Abdelhamid, L., & Atia, A. (2023, July). A Framework for Assessing Physical Rehabilitation Exercises. �In 2023 Intelligent Methods, Systems, and Applications (IMSA) (pp. 526-532). IEEE.

42 of 159

43 of 159

Research Pipeline

44 of 159

Datasets

  • UI-PRMD [5]
    • 10 exercises ,10 individuals repeated 10 times
    • Vicon optical tracker, and a Kinect camera
    • The data include the motion measurement for 22 joints
    • Text files

  • Collected Dataset
    • 3 exercises, 1 individual repeated 7 times
      • Mini squat – Sit to stand – Straight leg raise
    • RGB camera
    • Video Files
    • Extracted 33 body joints

44

45 of 159

Preprocessing

  • The Kinect Camera captures 22 body joints are stored in a vector V.
    • At each time point t,
    • The three-dimensional coordinates xt, yt, and zt of each joint data Jn.
  • Face joints were deemed irrelevant for classifying the exercises and were discarded
    • in our collected dataset

45

46 of 159

Feature Engineering

  • Feature extraction algorithms are applied to process these joints
    • 5 Statistical techniques are utilized including :�Standard deviation, maximum, median, mean, and minimum
    • which produced 330 features in total.
  • FCBF algorithm was employed to rank and select the most significant features.
  • The model selected the top 20 features and discarded the rest .
  • It was found that the feature importance score either plateaued or rapidly decreased after the 20th feature.

46

47 of 159

UI-PRMD Dataset Features

  • WaistX_std
  • WaistZ_std”
  • WaistY_std
  • RightUpperLegZ_std
  • RightUpperLegZ_mean
  • LeftCollarX_median
  • RightLowerLegX_std
  • RightCollarZ_max
  • LeftFootX_std
  • RightFootX_std
  • RightLegToesZ_std
  • RightCollarZmin
  • RightFootZ_min
  • RightCollarZ_median
  • RightUpperLegZ_median
  • LeftForearmY_std
  • NeckY_median
  • RightForearmY_min
  • RightUpperArmY_std
  • NeckZ_min

47

48 of 159

Experiments

  • Exp. 1
    • Was conducted to evaluate the performance of the Extra Trees classifier for action recognition on the UI-PRMD dataset, after applying FCBF feature selection.

  • Exp. 2.
    • Was conducted using our proprietary dataset in a real clinic setting to showcase the system's practicality and effectiveness in real time using only RGB Camera.

48

49 of 159

Used Algorithms 1 of 2

  • Extra Tree
    • A tree-based ensemble technique used in machine learning.

    • Extra Trees builds decision trees using the entire dataset.

    • Trees demonstrate significantly faster performance [9] [10] .

    • Instead of the greedy approach used in Random Forest, Extra Trees randomly select split values for features.

49

50 of 159

Used Algorithms 2 of 2

  • One Dollar
    • also known as the 1$ gesture recognition algorithm.
    • This algorithm converts a gesture into a sequence of points and then calculates the minimum distance between the gesture and a set of predefined templates.
    • The One Dollar algorithm typically uses two-dimensional (x, y)
      • However, in this study, the joint coordinates were extracted in 3D.
      • To overcome this, the X and Y coordinates were summed to create a new X value, while the Z coordinate was assigned to Y. Thus, the data in the tuples were represented as (x+y, z) format.
    • Due to the limited number of videos in the dataset, the One Dollar algorithm was chosen for this research.

50

51 of 159

Experiments 1 of 2

  1. Feature Extraction for 22 body joints.
  2. Ranked and Selected only the top 20 features.
  3. The data was split into 70- 30 for training and testing.
  4. Applied a cross-validation function with 30 folds.

51

52 of 159

Experiments 2 of 2

  1. Feature Extraction for 33 body joints.
  2. Face landmarks were excluded.
  3. (x+y, z)
  4. 1, 2 , and 3 videos used for training , 4 for testing.

52

53 of 159

Results

53

Number of Templates

Accuracy

Precision

Recall

F1 Score

One

72%

74.5%

72%

70.4%

Two

88%

93%

88%

87.9%

Three

90%

93.8%

90%

90.1%

Evaluation of 1$ algorithm based on different evaluation metrics for different number of templates used

Algorithm

Accuracy

Precision

Recall

F1 Score

Extra Tree

99.64%

99.74%

99.64%

99.62%

One Dollar

90%

93.8%

90%

90.1%

Evaluation of different algorithms based on different evaluation metrics

54 of 159

Conclusion

  • Machine learning requires additional stages of feature extraction and selection.
  • The Selection of these techniques greatly affects the performance of the same model.
  • The performance of the One Dollar algorithm is greatly affected by the number of templates employed for training, as higher numbers of templates for each exercise lead to better outcomes.
  • Extra Tree Classifier, outperformed the One Dollar algorithm in all four evaluation metrics
  • Despite having limited training data, the One Dollar algorithm still generated acceptable results.

54

55 of 159

Confusion Matrix for 1$ with 3 templates

55

56 of 159

Confusion Matrix for 1$ with 3 templates

56

57 of 159

Paper 3: Unlocking the Potential of RNN and CNN Models for Accurate Rehabilitation Exercise Classification on Multi-Datasets.

��Published at Multimedia Tools and Applications Journal (Q1), Springer.

DOI

Zaher, M., Ghoneim, A. S., Abdelhamid, L., & Atia, A. (2024). Unlocking the potential of RNN and CNN models for accurate rehabilitation exercise classification on multi-datasets. Multimedia Tools and Applications, 1-41.

58 of 159

59 of 159

Research Pipeline

60 of 159

Datasets

UI-PRMD

KIMORE

61 of 159

Preprocessing

  • Camera records the data of 22 body joints, storing this information in a vector denoted as V.
  • At each time instance t, the representation of each joint data Jn consists of three-dimensional coordinates: Xt, Yt, and Zt.
  • Feature extraction techniques are then applied to process each joint data.
  • These techniques include mean, median, minimum, maximum, and standard deviation.

62 of 159

For Disease Classification

  • SMOTE was applied due to the unbalancing of the data.

63 of 159

Hyper-parameters tuning

  • Deep learning poses a significant challenge in terms of model optimization.
  • Common techniques are random search and grid search
  • Grid Search proves more convenient when the Hyper-parameter count is limited, whereas Random Search excels when dealing with a larger number of Hyper-parameters.[8]
  • Random Search of 100 Trials was conducted to determine the values of these parameters.

Bergstra J, Bengio Y. Random search for hyper-parameter optimization. Journal of machine learning research. 2012 Feb 1;13(2).

64 of 159

For Disease Classification

  • SMOTE was applied due to the unbalancing of the data.

65 of 159

Used Algorithms

  • LSTM
  • Bi-LSTM
  • CNN-LSTM
  • CNN

66 of 159

Random Search

  • We explored four distinct combinations:
    1. Tuning on the KIMORE dataset and training on the UI-PRMD dataset.
    2. Tuning on the UI-PRMD dataset and training on the KIMORE dataset.
    3. Initial tuning on the KIMORE dataset followed by a subsequent round of Hyper-parameter optimization on the UI-PRMD dataset.
    4. Initial tuning on the UI-PRMD dataset followed by a subsequent round of Hyper-parameter optimization on the KIMORE dataset.
  • The number of (LSTM) layers, number of (LSTM) units, dropout rate, learning rate (ranging from 0.0001 to 0.01), type of regularizer (l1, l2, or none), and its associated strength.

67 of 159

Parameters : LSTM

Parameter

Range of Values

Best Value

LSTM Layers

From 1 to 7

1

LSTM Units

From 32 to 1024

320

Dropout Rate

From 0.2 to 0.5

0.26

Learning Rate

from 0.0001 to 0.01

0.0005

LSTM Regularizer

l1, l2, or none

l2

Dense Regularizer

l1, l2, or none

None

Dense Layers

From 1 to 5

2

Dense Units

From 64 to 1024

940

68 of 159

Parameters : Bi-LSTM

Parameter

Range of Values

Best Value

BiLSTM Layers

From 1 to 5

2

BiLSTM Units

From 32 to 1024

271

Dropout Rate

From 0.2 to 0.5

0.3

Learning Rate

from 0.0001 to 0.01

0.001014

Regularizer

l1, l2, or none

None

Dense Regularizer

l1, l2, or none

None

Dense Layers

From 1 to 5

4

Dense Units

From 64 to 1024

927

69 of 159

Parameters : CNN-LSTM

Parameter

Range of Values

Best Value

Filters

From 32 to 1024

128

Kernel Size

From 3 to 10

8

LSTM Units

From 64 to 1024

256

Dropout Rate

From 0.2 to 0.5

0.2

Learning Rate

from 0.0001 to 0.01

0.00077

Conv Regularizer

l1, l2, or none

l2

LSTM Regularizer

l1, l2, or none

None

Dense Regularizer

l1, l2, or none

None

Dense Layers

From 1 to 5

2

Dense Units

From 64 to 1024

927

70 of 159

Parameters : CNN

Parameter

Range of Values

Best Value

Convolutional Layers

From 1 to 6

2

Conv Units

From 32 to 512

48

Dropout Rate

From 0.2 to 0.5

0.2

Learning Rate

from 0.0001 to 0.01

0.0025284

Conv Regularizer

l1, l2, or none

None

Dense Regularizer

l1, l2, or none

None

Dense Layers

From 1 to 5

1

Dense Units

From 64 to 1024

544

71 of 159

Evaluation Metrics

  • Loss
  • Accuracy
  • Precision
  • Recall
  • F1-Score

72 of 159

Training

  • All experiments were run on the same machine with 15GB of GPU.
  • Early Stop and Reduce Learning Rate On Plateau are utilized.

Parameter

Value

Epochs

450

Train-test split

80 20

Train-val split

80 20

Folds

5

Batch Size

32

Hidden Layer Activation

ReLU

Output Layer Activation

Softmax

Optimizer

Adam

Loss

Categorical cross-entropy

73 of 159

74 of 159

Experiments

  1. The first experiment was conducted to find the best exercise classification algorithm across both datasets.
  1. The second experiment was conducted to classify different diseases from patients while performing the same five exercises in the KIMORE dataset.

75 of 159

Exercise Classification

UIPRMD

KIMORE

76 of 159

State-of-the-art - KIMORE

  • CNN scored:
    • 93.08% accuracy (top)
    • 93.07% precision (top)
    • 93.96% recall (top)
    • 91.79% f1-score (top)
    • 0.2860 loss (least)
    • 2.3 minutes training time (least)
  • 0.75% performance increase.

Method

Accuracy

Ensemble-based Graph Convolutional Network (EGCN)

80.10%

3D Convolution Neural Network (3D-CNN)

90.57%

Many-to-Many model with density map output

92.33%

Our Tuned-CNN

93.08%

77 of 159

State-of-the-art - UIPRMD

  • CNN scored:
    • 99.70% accuracy (top)
    • 99.70% precision (top)
    • 99.75% recall (top)
    • 99.70 % f1-score (top)
    • 0.0122 loss (least)
    • 2.15 minutes training time (least)
  • 0.01% performance increase over ML with only 20 features.

Method

Accuracy

Ensemble-based Graph Convolutional Network (EGCN)

86.90%

Graph Convolutional Siamese Network

99.20%

FCBF - Extra Tree

99.60%

Our Tuned-CNN

99.70%

78 of 159

Disease Classification

79 of 159

Disease Classification Results

  • CNN scored:
    • 89.87% accuracy (top)
    • 88.91% precision (top)
    • 90.63% recall (top)
    • 89.49 % f1-score (top)
    • 0.48286 loss (least)
    • 7.32 minutes training time

80 of 159

Conclusion

  • The CNN model exhibited outstanding accuracy, attaining scores of 93.08% and 99.7% on the KIMORE and UI-PRMD datasets, respectively.
  • This surpasses the state-of-the-art on both datasets by 0.75 and 0.1%, respectively.
  • Moreover, the model demonstrated notable proficiency in disease classification, enabling the detection of correct and incorrect exercise techniques and achieving a disease diagnosis accuracy of 89.87%.
  • Deep Learning Models requires extensive data and training iterations.
  • The proposed CNN model required the least amount of iterations while achieving the best performance.

81 of 159

Paper 4: Fusing CNN and Attention Mechanisms: Advancements in Real-Time Human Activity Recognition�for Rehabilitation Exercises Classification

�Submitted to Computers in Biology and Medicine (Q1), Elsevier

82 of 159

Research Pipeline

83 of 159

Datasets

UI-PRMD

KIMORE

84 of 159

Preprocessing

  • Camera records the data of 22 body joints, storing this information in a vector denoted as V.
  • At each time instance t, the representation of each joint data Jn consists of three-dimensional coordinates: Xt, Yt, and Zt.
  • Feature extraction techniques are then applied to process each joint data.
  • These techniques include mean, median, minimum, maximum, and standard deviation.

85 of 159

Scalogram Generation

Mel-frequency cepstral coefficients

Continuous Wavelet Transform

86 of 159

Image Representation: CWT

  • C𝑊𝑇𝐗𝐘𝐙(𝑎, 𝑏) signifies the Continuous Wavelet Transform
  • Applied to the body joint coordinates 𝐗𝐘𝐙(𝑡)
  • Considering scale 𝑎 and shift 𝑏.
  • The mother wavelet,𝜓(𝑡), is a function with specific localization in time and frequency domains.
  • Integration over all time is indicated by ∫
  • While √|𝑎| ensures normalization of the transform based onthe scale’s absolute value.

87 of 159

Image Representation

88 of 159

Limitations of Machine Learning

  • Machine learning requires additional stages of feature extraction and selection.
  • The selection of these techniques greatly affects the performance of the same model.

89 of 159

Limitations of Deep Learning

  • Despite the remarkable achievements of deep learning (DL) in Human Activity Recognition (HAR)
    • A primary concern is the substantial volume of labeled data required to effectively train deep neural networks.
    • This dependency on extensive datasets can pose challenges in environments where data collection is difficult or privacy concerns are paramount such as healthcare.
    • Furthermore, the computational complexity and resource demands of training deep learning models are significantly high.
  • Transfer learning (TL) emerges as a powerful solution to the deep learning (DL) challenge of requiring vast datasets for training.

90 of 159

Transfer Learning

  • Models trained on large datasets to solve a specific problem.
  • Available for use 'out-of-the-box' for similar tasks, often covering domains like image recognition, natural language processing, etc.

  • Popular Pre-trained Models
    • Image Processing: ResNet, Inception, and MobileNet.

91 of 159

Fusion

  • One can leverage the strengths and mitigate the weaknesses of individual models, thereby achieving better performance than any single model could on its own.

92 of 159

Algorithms

  • CNN-Based
  • Attention-Based
  • Fused

CNN

Attention

93 of 159

Fused Algorithms Models

2 Models

  1. CNN – ViT
  2. VGG16 – ViT
  3. DenseNet121 – ViT
  4. DenseNet201 – ViT
  5. ResNet50 – ViT
  6. ResNet101 – ViT
  7. MobileNetV2 – ViT
  8. MobileNetV3Large – ViT
  9. MobileNetV3Small – ViT

3 Models

  1. ResNet50 – MobileNetV3Small – ViT
  2. Res101-Dense201-ViT
  3. DenseNet201-MobileNetV3Small-ViT

94 of 159

Model Architectures

95 of 159

Top Layer Archiecture

  • Random Search was applied to determine NN architecture.
  • 100 trials was executed on MobileNetV3Small
    • Due to it’s efficient training time
  • Among all 100 trials, 2 architectures were favorable

96 of 159

Two Architectures

Parameter

First Architecture

Second Architecture

Number of Dense Layers

2

2

1st Dense Layer’s Units

512

352

2nd Dense Layer’s Units

265

176

Dropout Rate

0.2

0.2

97 of 159

Evaluation Metrics

Optimization Metrics�

  • Loss
  • Accuracy (%)
  • Precision (%)
  • Recall (%)
  • F1-Score (%)

Satisfactory Metrics

  • Training Time (S)
  • Testing Time (S)
  • Number of Epochs
  • Model Size (MB)

98 of 159

Training

  • All experiments were run on the same machine with 15GB of GPU and 12GB of RAM.
  • Early Stop and Reduce Learning Rate On Plateau are utilized.

Parameter

Value

Epochs

150

Train-test split

80 20

Train-val split

80 20

Folds

5

Batch Size

64

Input SHape

224x224x3

Hidden Layer Activation

ReLU

Output Layer Activation

Softmax

Optimizer

Adam

Loss

Categorical cross-entropy

99 of 159

Experiments

The first experiment aimed to identify the optimal top-layer architecture for achieving the most accurate exercise classification results on the UI-PRMD dataset.

Simultaneously, the second experiment focused on implementing numerous hybrid CNN-ViT architectures, validating the results through cross-validation and incorporating an additional dataset.

100 of 159

For Disease Classification

  • SMOTE was applied due to the unbalancing of the data.

101 of 159

Exp 1 Results (CNN-Based Only)

Comparison of the best two architectures for the Fully Connected Network architecture on the UI-PRMD dataset, focusing on CNN-based models. The 2D bar chart visually represents the accuracy distinctions between the two architectures.

102 of 159

Exp 1 Results (Attention-Based)

Comparison of the best two architectures for the Fully Connected Network architecture on the UI-PRMD dataset, focusing on Attention-based models. The 2D bar chart visually represents the accuracy distinctions between the two architectures.

103 of 159

All Model Results on UIPRMD

Comparison of results across various algorithms using the second architecture on the UIPRMD dataset. The image depicts the performance of 20 different CNN-Based Models alongside 13 attention-based and fused models.

104 of 159

All Model Results on UIPRMD

105 of 159

Cross Validation on UI-PRMD

106 of 159

Comparison with state-of-the-art on UI-PRMD

Method

Accuracy

F1-Score

Results

GCN

92.64%

Training

ST-GCN

98.90%

Training

2S-AGCN

99.10%

Training

Graph Convolutional Siamese Network

99.20%

Training

Spike Train

77%

Testing

Graph Transformer

85%

Testing

EGCN

86.90%

Testing

Res50-MobileV3Small-ViT

89.30%

89.07%

Testing

DenseNet121

89.33%

89.06%

Testing

Res50-Dense201-ViT

89.80%

89.59%

Testing

DenseNet201-MobileNetV3Small-ViT

89.80%

89.64%

Testing

107 of 159

108 of 159

All Model Results on KIMORE

Comparison of results across various algorithms using the second architecture on the KIMORE dataset. The image depicts the performance of 20 different CNN-Based Models alongside 13 attention-based and fused models.

109 of 159

All Model Results on KIMORE

110 of 159

Cross Validation on KIMORE

111 of 159

Comparison with state-of-the-art on KIMORE

Algorithm

Accuracy

EGCN

80.10%

3D-CNN

90.57%

Many-to-Many model with density map output

92.33%

Res50-Dense201-ViT

93.78%

Res50-MobileV3Small-ViT

94.04%

Dense201-MobileV3Small-ViT

94.30%

ViT

95.08%

MobileNetV3Small-ViT

95.33%

112 of 159

113 of 159

Comparison of inference time

114 of 159

Comparison of model size in MB

115 of 159

Discussion

  • The Continuous Wavelet Transform (CWT) has demonstrated its superiority over alternative image representation techniques.
  • The choice of deep learning model for exercise classification necessitates a consideration of trade-offs between model complexity and computational efficiency.
    • DenseNet201 and ResNet101, exhibit reliable efficacy across a variety of datasets.
    • CNN-based frameworks necessitate a substantially lower number of epochs for training in comparison to attention-based architectures.
  • The 3-model architecture outperforms both single and 2-model architectures.
  • The increased model complexity translates to challenges in deploying it on resource-constrained devices or in real-time applications
    • Cloud-based deployment strategies can be utilized by deploying an API created with flask.

116 of 159

Discussion

  • Attention-based models, especially the Vision Transformer, showcase impressive performance on specific datasets while encountering challenges with others, mirroring a similar pattern observed in CNN-based models.
  • DenseNet201 and ResNet101, exhibit reliable efficacy across a variety of datasets. It is noteworthy that these CNN-based frameworks necessitate a substantially lower number of epochs for training in comparison to attention-based architectures.
  • Despite the fused models not emerging as the top performers on either dataset, its consistent placement within the top models on both datasets is a remarkable achievement, underscoring its capacity for generalization.

117 of 159

Conclusion

  • The study compared 20 CNN-based models, 1 attention-based, 13 fused models with 2 network architectures applying a 5-fold cross-validation. (training of +290 models)
  • In comparison, CNN-Based models emerge as a favorable choice in scenarios where file size is a critical consideration. Models like ResNet101 and DenseNet201, offering smaller file sizes, become preferable choices in such contexts.
  • For those prioritizing testing time, MobileNetV3Small stands out as a viable option, although its performance can significantly vary depending on the dataset in use. Fused models strike a balance between performance and testing time, albeit at the expense of a larger model size.

118 of 159

Discussion

119 of 159

Discussion

  • 1$ algorithm is suitable when there’s a small amount of data (our collected dataset). The performance of the 1$ is increased with the increase of number of templates.
  • The ensembled model Extra Tree outperformed all other classical machine learning model. Moreover, The FCBF stands out as the most suitable feature ranking technique.
  • CWT outperformed other signal processing techniques such as MFCC and MEL.
  • After converting the time-series data into 2D image, The fused Attention-CNN model outperfomed other single model approaches with tri-model architecture outperfoming both uni-model and dual-model architectures.
  • 1D CNN achieved the state-of-the-art results on both benchmarking dataset.

120 of 159

Discussion

  • While using ML models provided a solid performance across both datasets, It required additional processing for feature extraction and selection. Moreover, it only uses 20 features of the original dataset to yield this performance
  • Attention model require more computational resources than 2D CNN-based models. However, the single model architecture didn’t yield a consisent results on multiple dataset.
  • Fused models showed consistent results across multiple dataset at the cost of computational power and an acceptable inference time.
  • 1D-CNN model showed solid performance and can be used for low-power devices.

121 of 159

SE Prespective

  • Research stakeholders include experts from Faculty of Physical Therapy.
    • Defining requirements
    • Providing feedback

  • Project can be productized.

122 of 159

Collaboration With Physical Therapy

September 2022

October 2023

123 of 159

Patient Dashboard

124 of 159

Conclusion

125 of 159

Conclusion

  • This reseach investigated increasing the productivity of phyiscal therapists by monitoring more patients at the same time by the use of various HAR techniques.
  • The study investigated using kinect and a more affordable option of using RGB cameras.
  • A case study was also conducted by collecting a real-world dataset in the university clinics.
  • Various learning techniques were experimented to achieve �state-of-the-art results on multiple benchmarking datasets.

126 of 159

Future Work

  • Improve the framework to detect multiple human poses per frame from RGB cameras without compromising processing speed (frames per second).
  • Extend the current work to include Augmented Reality (AR) and Virtual Reality (VR) technologies to promote more effective home-based rehabilitation.
  • Develop techniques to compress the models for efficient operation on mobile devices.
  • Create an automated recommendation engine capable of personalizing treatment plans and exercise regimens for individual patients.
  • Conduct further research into disease classification and diagnosis, necessitating the acquisition of larger datasets for improved accuracy.

127 of 159

128 of 159

UI-PRMD

129 of 159

UI-PRMD

130 of 159

Collected Dataset

131 of 159

Collected Dataset Classes

131

Exercise

Incorrect Tempate

Mini Squat

Uncontrolled Knee Position

Excessive Trunk Flexion

Sit-to-Stand

Uncontrolled Knee Position

Excessive Trunk Flexion

Straight Leg Raising

Knee Flexion

Ankle Planter Flexion + Knee Flexion

Ankle Planter Flexion + Knee Slight Flexion

132 of 159

Collected Dataset : Extracting Joints

132

133 of 159

Body-joint extractor

  • OpenPose and Blaze Pose
    • Blaze Pose can process more frames per seconds

    • Mediapipe is a lightweight framework developed by google built upon Blaze Pose model.

134 of 159

One Dollar

135 of 159

Extra Tree

  • A variant of the Random Forest algorithm.
  • The main difference lies in how the trees are built.
    • In Random Forests, each tree is constructed based on the best split among a random subset of features.
    • In Extra Trees, every feature's random split is considered, and the best split among all random splits is used for each node in the tree. This additional randomization makes Extra Trees even more robust against overfitting and can lead to further variance reduction.

135

136 of 159

Extra Tree

  • Extra Trees add an extra layer of randomness during tree construction, which can lead to better generalization and improved performance in some cases.

136

137 of 159

LightGBM

  • Gradient boosting is an ensemble learning technique where multiple weak learners (usually decision trees) are trained sequentially.

  • Light GBM employs a leaf-wise tree growth strategy instead of depth-first.

137

138 of 159

FCBF

Fast Correlation-Based Filter:

  • A feature selection algorithm used to identify the most relevant and informative features from a dataset.
  • FCBF aims to reduce the dimensionality of the data by selecting a subset of features that are highly correlated with the target variable while having low intercorrelations among themselves.

138

139 of 159

CNN

140 of 159

1D CNN

141 of 159

Benefits of Transfer Learning

  • Time Efficiency
  • Data Efficiency
  • Improved Performance

142 of 159

Modifying and Fine-Tuning Pre-trained Models

  • Customizing the Model for Specific Tasks
    • Replacing Trainable Layers
  • Fine-Tuning Strategies
    • Layer Freezing.
    • Selective Training

143 of 159

144 of 159

VIT

145 of 159

Residual Models

146 of 159

Dense Models

147 of 159

Mob

148 of 159

Acknowledgments

149 of 159

Acknowledgments

  • Dr. Ayman Ezzat
  • Dr. Amr Ghoniem
  • Dr. Laila Abdelhamid

I am deeply grateful for the support and guidance provided by my supervisors:

  • Raghda Essam and Noha Ahmed
  • Family
  • Friends

A special thank you to:

150 of 159

References

  1. W. H. Organization, “Rehabilitation.” Available online, 2023. Accessed on June 15, 2023.
  2. Arshad, M. H., Bilal, M., & Gani, A. (2022). Human activity recognition: Review, taxonomy and open challenges. Sensors, 22(17), 6463.
  3. Debnath, B., O’brien, M., Yamaguchi, M., & Behera, A. (2022). A review of computer vision-based approaches for physical rehabilitation and assessment. Multimedia Systems, 28(1), 209-239.
  4. Tasnim, N., & Baek, J. H. (2023). Dynamic edge convolutional neural network for skeleton-based human action recognition. Sensors, 23(2), 778.
  5. Yue, R., Tian, Z., & Du, S. (2022). Action recognition based on RGB and skeleton data sets: A survey. Neurocomputing, 512, 287-306.
  6. Rashid, F. A. N., Suriani, N. S., Mohd, M. N., Tomari, M. R., Zakaria, W. N. W., & Nazari, A. (2020). Deep convolutional network approach in spike train analysis of physiotherapy movements. In Advances in Electronics Engineering: Proceedings of the ICCEE 2019, Kuala Lumpur, Malaysia (pp. 159-170). Springer Singapore.
  7. A. Vakanski, H.-P. Jun, D. Paul, and R. Baker, “A data set of human body movements for physical rehabilitation exercises,” Data (Basel), vol. 3, Mar. 2018.
  8. Capecci, M., Ceravolo, M. G., Ferracuti, F., Iarlori, S., Monteriu, A., Romeo, L., & Verdini, F. (2019). The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(7), 1436-1448.
  9. M. Fern ́andez-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classifiers to solve real world classification problems?,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3133– 3181, 2014.
  10. R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd international conference on Machine learning, pp. 161–168, ACM, June 2006.

150

16/07/2023

151 of 159

Proposed Framework

151

System Overview

152 of 159

Proposed System : Overview

153 of 159

Proposed System : Technical

154 of 159

155 of 159

Proposed System : Track 1 - Classical

156 of 159

Proposed System : Track 2 – Deep Learning

157 of 159

Research plan

    • Survey the existing machine learning and deep learning algorithms .

Phase 1

    • Assessing the weaknesses and strength of the existing models .

Phase 2

    • Machine learning algorithms in order to support data cleaning of imbalanced data .

Phase 3

    • Machine Learning or Deep Learning algorithm that achieves the highest accuracy to detect right/wrong techniques.

Phase 4

    • Applying and assessing the proposed algorithm on real Dataset

Phase 5

158 of 159

Reference

  1. W. H. Organization, “Rehabilitation.” Available online, 2023. Accessed on June 15, 2023.
  2. B. Debnath, M. O’Brien, M. Yamaguchi, et al., “A review of computer vision-based approaches for physical rehabilitation and assessment,” Multimedia Systems, vol. 28, no. 2, pp. 209–239, 2022
  3. F. A. Rashid, N. S. Suriani, M. N. Mohd, M. R. Tomari, W. N. W. Zakaria, and A. Nazari, “Deep convolutional network approach in spike train analysis of physiotherapy movements,” in Advances in Electronics Engineering: Proceedings of the ICCEE 2019, Kuala Lumpur, Malaysia, pp. 159–170, Springer Singapore, 2020
  4. N. Tasnim and J.-H. Baek, “Dynamic edge convolutional neural network for skeleton-based human action recognition,” Sensors, vol. 23, no. 2,p. 778, 2023.
  5. A. Vakanski, H.-P. Jun, D. Paul, and R. Baker, “A data set of human body movements for physical rehabilitation exercises,” Data (Basel), vol. 3, Mar. 2018.
  6. M. Fern ́andez-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classifiers to solve real world classification problems?,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3133– 3181, 2014.
  7. R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd international conference on Machine learning, pp. 161–168, ACM, June 2006.

158

16/07/2023

159 of 159

Reference

  1. “Rehabilitation.” World Health Organization.  https://www.who.int/health-topics/rehabilitation#tab=tab_1 / (accessed Sep. 11, 2022).
  2. “Rehabilitation Facts” World Health Organization . https://www.who.int/news-room/fact-sheets/detail/rehabilitation (accessed Sep. 11 , 2022)
  3. Gu Y, Pandit S, Saraee E, Nordahl T, Ellis T, Betke M. Home-based physical therapy with an interactive computer vision system. InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops 2019 (pp. 0-0).
  4. Rivas JJ, del Carmen Lara M, Castrejon L, Hernandez-Franco J, Orihuela-Espina F, Palafox L, Williams A, Bianchi-Berthouze N, Sucar LE. Multi-label and multimodal classifier for affective states recognition in virtual rehabilitation. IEEE Transactions on Affective Computing. 2021 Feb 1;13(3):1183-94.
  5. Debnath B, O’brien M, Yamaguchi M, Behera A. A review of computer vision-based approaches for physical rehabilitation and assessment. Multimedia Systems. 2021 Jun 19:1-31.
  6. Chambers C, Seethapathi N, Saluja R, Loeb H, Pierce SR, Bogen DK, Prosser L, Johnson MJ, Kording KP. Computer vision to automatically assess infant neuromotor risk. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2020 Oct 6;28(11):2431-42.
  7. Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2D pose estimation using part affinity fields,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1–9.
  8. Sucar LE, Luis R, Leder R, Hernández J, Sánchez I. Gesture therapy: A vision-based system for upper extremity stroke rehabilitation. In2010 Annual International Conference of the IEEE Engineering in Medicine and Biology 2010 Aug 31 (pp. 3690-3693). IEEE.
  9. Lin, T.-Y., Hsieh, C.-H., Lee, J.-D.: A kinect-based system for physical rehabilitation: Utilizing tai chi exercises to improve movement disorders in patients with balance ability. In:Modelling Symposium (AMS), 2013 7th Asia, pp. 149–153. IEEE (2013)