1 of 159

Using Human Activity Recognition in Physical Rehabilitation Exercises in Real-time

Presented By : Moamen Zaher

Assoc. Prof. Ayman Ezzat Atia

Computer Science Department

Faculty of Computers & Artificial Intelligence

Helwan University

Dr. Amr Ghoneim

Computer Science Department

Faculty of Computers & Artificial Intelligence

Helwan University

Dr. Laila M. Abdelhamid

Information System Department

Faculty of Computers & Artificial Intelligence

Helwan University

This Presentation is Presentedto Department of Information Systems to Obtain Master Degree in Software Engineering

2 of 159

Agenda

Introduction
Background and Literature Review
Methodology
Approaches and Results
Discussion
Conclusion
Q&A
Acknowledgments
References

3 of 159

Introduction

4 of 159

Introduction 1/2

4

Rehabilitation is a long process so we need to shorten hospital stays.

Rehabilitation is a set of interventions designed to optimize functioning and reduce disability in individuals with health conditions in interaction with their environment.

WHO (World health Organization / Health Topics / Rehabilitation^[1]

Wrong execution of the exercises can hinder injury and increase recovery time.

5 of 159

Introduction 2/2

To promote Home-based Rehabilitation

5

WHO (World health Organization / Health Topics / Rehabilitation

Different skeleton parts, angles and trajectories for different body joints are required

Data must be acquired using sensors

An AI model must be developed to classify correct & wrong execution of exercises

6 of 159

Motivation

2.4 billion people are currently require rehabilitation.
50% of people do not receive the rehabilitation.
Rehabilitation services are under funded and under valued, particularly in countries without strong health systems.
The number of skilled rehabilitation practitioners in low- and middle-income countries is <10 per 1 million.
The number of people over 60 years of age is predicted to double by 2050. Hence, more people are living with chronic diseases.

6

WHO (World health Organization / Health Topics / Rehabilitation

Map of leading health conditions requiring rehabilitation in each country, 2019

7 of 159

Problem Statement

Lack of prioritization, funding, policies and plans for rehabilitation at a national level.
Lack of available rehabilitation services outside urban areas, and long waiting times.
High out-of-pocket expenses and non-existent or inadequate means of funding.
Lack of trained rehabilitation professionals, with less than 10 skilled practitioners per 1 million population in many low- and middle-income settings.
Lack of resources, including assistive technology, equipment and consumables.
The need for more research and data on rehabilitation.
Ineffective and under-utilized referral pathways to rehabilitation.

7

WHO (World health Organization / Rehabilitation 10 November 2021

8 of 159

Research Objective

Train more professionals

Increase productivity

❌

✅

9 of 159

Research Outcomes

Allow patients to practice exercises at home without the need to go the physio clinics.

Real-Time feedback for the patient whether he’s done the exercise correctly or not.

Cut-Down Cost of rehabilitation.

Allow doctors to monitor patients progress.

10 of 159

Background and Literature Review

11 of 159

Background

Human Activity Recogntition

Is the process of automatically determining human actions and behaviors from sensor data.
It has applications in areas like health monitoring, surveillance, sports analysis, and human-machine interaction.
HAR systems use sensors like cameras, ambient sensors, or wearables to observe and classify activities, which can range from simple repetitive actions to complex multi-step tasks.

Arshad, M. H., Bilal, M., & Gani, A. (2022). Human activity recognition: Review, taxonomy and open challenges. Sensors, 22(17), 6463. ^[2]

12 of 159

Related Work

12

Debnath, B., O’brien, M., Yamaguchi, M., & Behera, A. (2022). A review of computer vision-based approaches for physical rehabilitation and assessment. Multimedia Systems, 28(1), 209-239. ^[3]

Computer Vision-Based approaches in physiotherapy

Rehabilitation

Virtual

Skeleton-based

Non-skeleton based

Automated Assessment

Direct

Pure Vision-based

Multi-modal

Assessment

Comparison

Kinematics-based Modeling

Statistical Model

Stochastic Methods

Categorization

Rule-based

Statistical and Stochastic Algorithms-based

Scoring

Author Proposed

Clinical

13 of 159

Digital Rehabilitation

Direct Rehabilitation

Virtual Rehabilitation

14 of 159

Data Acquisition

4 types of data modalities ^[4]
We proceed with skeleton data ^[5]

Tasnim, N., & Baek, J. H. (2023). Dynamic edge convolutional neural network for skeleton-based human action recognition. Sensors, 23(2), 778. ^[4]

Yue, R., Tian, Z., & Du, S. (2022). Action recognition based on RGB and skeleton data sets: A survey. Neurocomputing, 512, 287-306. ^[5]

15 of 159

Datasets

Author	Impairment	Details	Sensor/Data
SPHERE- Walking2015	Sit to stand	109 sequences, 10 individuals, restricted knee, hip, freezing	Kinect/ Kinect SDK, OpenNI SDK skeleton
Parkinson’s pose�estimation	PD, LID, UPDRS assessment tasks	526 sequence, PD, LID patients, 4 UPDRS assessment tasks	RGB Camera/CPM skeleton
AHA-3D	Senior lower body fitness	11 young, 10 elderly subjects, 4 exercises	Kinect/ RGB, depth, skeleton
UI-PRMD	Physical Rehabilitation Movement	10 rehabilitation exercises 10 healthy individuals	Vicon optical tracker, and a Kinect camera
KIMORE	Stroke, PD, back pain exercises	44 healthy, 34 patient subjects, 5 exercises 5 repetitions	Kinect/ RGB, depth, skeleton
UTD-MHAD	27 different actions	8 subjects (4 females and 4 males). Each subject repeated each action 4 times.	one Kinect camera and one wearable inertial sensor

16 of 159

Related Work 1/2 :

This research intended to classify different types of exercises by implementing spike train features into deep learning.
UI-PRMD dataset
This paper chose to adopt ResNet as their CNN model for classification.
Data has been parted into 100 frames.
The classification achieved 77%.

16

Rashid, F. A. N., Suriani, N. S., Mohd, M. N., Tomari, M. R., Zakaria, W. N. W., & Nazari, A. (2020). Deep convolutional network approach in spike train analysis of physiotherapy movements. In Advances in Electronics Engineering: Proceedings of the ICCEE 2019, Kuala Lumpur, Malaysia (pp. 159-170). Springer Singapore. [6]

17 of 159

Design a new deep learning model by integrating criss-cross attention and edge convolution to extract discriminative features from the skeleton sequence for action recognition.
UTD-MHAD and MSR-Action3D datasets
CNN
The proposed method achieved average accuracies of 99.53% and 95.64% respectively.

17

Tasnim, N., & Baek, J. H. (2023). Dynamic edge convolutional neural network for skeleton-based human action recognition. Sensors, 23(2), 778. ^[4]

Related Work 2/2 :

18 of 159

Methodology

19 of 159

Overview of the Methodology

20 of 159

UI-PRMD

University of Idaho - Physical Rehabilitation Movements dataset.
10 exercises, 10 individuals repeated 10 times.
20 classes and 2000 records.
Vicon optical tracker, and a Kinect camera
The data include the motion measurement for 22 joints
Text files

A. Vakanski, H.-P. Jun, D. Paul, and R. Baker, “A data set of human body movements for physical rehabilitation exercises,” Data (Basel), vol. 3, Mar. 2018. ^[7]

21 of 159

KIMORE

KInematic Assessment of MOvement and Clinical Scores for Remote Monitoring of Physical REhabilitation
RGB, and Kinect camera
The data include the motion measurement for 25 joints
CSV files

Capecci, M., Ceravolo, M. G., Ferracuti, F., Iarlori, S., Monteriu, A., Romeo, L., & Verdini, F. (2019). The kimore

dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation.

IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(7), 1436-1448. ^[8]

22 of 159

KIMORE

78 Subjects, 385 records

44 healthy

17 expert
27 not experts

34 patients

4 levels

Healthy
Parkinson
Stroke
Back Pain

23 of 159

Collected Datasets

Collected Dataset ^ℹ

3 exercises, 1 individual repeated 7 times

Mini squat – Sit to stand – Straight leg raise

RGB camera
Video Files
Extracted 33 body joints

23

24 of 159

Research Directions

Approach 1:��Comparative Study of Machine Learning Algorithms.�

Approach 2:��Case Study: A framework for assessing physical rehabilitation exercises.

Approach 3:��Comparative Study between CNN and RNN algorithms on multiple datasets.�

Approach 4:��Transfer Learning and Model Fusion.�

✅

⏳

25 of 159

Approaches and Results

26 of 159

Paper 1: Rehabilitation Monitoring and Assessment: A Comparative Analysis of Feature Engineering and Machine Learning Algorithms on the UI-PRMD and KIMORE Benchmark Datasets

�Under Review in Journal of Information and Telecommunication(Q2), Taylor & Francis.

27 of 159

Research Pipeline

28 of 159

UI-PRMD Dataset Representation

29 of 159

Data Processing

Kinect Camera extracts 22 body joints.
All 22 body joints were stored in a vector V.
These joints were then processed using statistical techniques of:

maximum, minimum, mean, standard deviation, and median.

30 of 159

Feature Ranking and Selection

Feature ranking is implemented using various methods, such as filter methods, wrapper methods.
This research employed six distinct filter methods (Relief, FCBF, X2, Gini Decrease, Information Gain, and Information Gain Ratio.
We select top 20 features.

31 of 159

Feature Ranking and Selection

32 of 159

Action Classifcation

Non-Ensembled

Logistic Regression (LR)
K Nearest Neighbors (KNN)
Naive Bayes (NB)
Ridge
Quadratic Discriminant Analysis (QDA)
Linear Discriminant Analysis (LDA)
Decision Tree
Support Vector Machine (SVM)

Ensembled Models

Random Forest (RF)
Ada Boost (ADA)
Gradient Boosting Classifier (GBC)
Extra Trees Classifier (ET)
Light Gradient Boosting Machin (lightGBM)

33 of 159

Experiments

The initial experiment was conducted to compare the state-of-art feature selection techniques to minimize the feature count and to choose the most effective feature ranking method.

The second experiment compared a variety of classification algorithms on the same dataset to ascertain the optimal combination of feature ranking techniques and machine learning classification algorithms.

34 of 159

Results : Ranking Methods

35 of 159

Results : Algorithms

36 of 159

Results

37 of 159

UI-PRMD Results

Classifier	X²	FCBF	ReliefF	Gini Decrease	Information Gain	Information Gain Ratio	Classifier Average
Ada Boost Classifer	94.04%	93.37%	86.20%	93.66%	93.60%	91.66%	92.10%
Decision Tree	99.00%	99.07%	91.12%	98.81%	98.81%	98.81%	98.81%
Extra Tree	98.37%	99.64%	99.94%	97.81%	97.56%	98.25%	98.60%
Gradient Boosting	98.87%	98.93%	98.50%	98.81%	98.81%	98.81%	98.80%
KNN	92.44%	94.01	99.88%	91.63%	91.63%	93.07%	93.80%
Light Gradient Boosting Machine	98.94%	99.36%	98.75%	99.06%	99.06%	99%	99%
Linear Discriminant Analysis	98.87%	68.05%	97%	97.62%	97.62%	97.56%	92.80%
Logistic Regression	90.75%	94.86%	79.12%	87.94%	87.94%	88.31%	88.20%
Naïve Bayes	96.94%	96.85%	95.25%	96.38%	96.38%	86.75%	94.80%
Quadratic Discriminant Analysis	66.18%	87.76%	18.44%	67.31%	68.25%	40.56%	58.10%
Random Forest	98.31%	99.57%	99.62%	82.24%	98.06%	98.06%	96%
Ridge	68.26%	77.71%	53.69%	59.81%	59.81%	59.94%	63.20%
Support Vector Machine	86.83%	95.25%	69.61%	85.25%	85.25%	83.82%	84.30%
Average Accuracy of Each Feature Ranking Technique	91.37%	92.65%	83.62%	88.95%	90.21%	87.28%

38 of 159

Classifier	X²	FCBF	ReliefF	Gini Decrease	Information Gain	Information Gain Ratio	Mean Classifier Accuracy
Ada Boost Classifier	54.95%	56.67%	53.38%	55.28%	56.11%	59.86%	56.04%
Decision Tree	67.45%	74.35%	68.56%	66.53%	72.82%	73.19%	70.48%
Extra Tree	76.81%	81.85%	74.63%	72.78%	75.14%	76.81%	76.34%
Gradient Boosting	74.95%	74.95%	68.80%	69.54%	74.03%	73.98%	72.71%
KNN	75.00%	71.30%	71.85%	68.52%	71.90%	73.15%	71.95%
Light Gradient Boosting Machine	73.89%	71.44%	73.89%	72.96%	76.90%	77.64%	75.45%
Linear Discriminant Analysis	58.89%	58.89%	61.94%	61.53%	57.78%	60.83%	59.98%
Logistic Regression	59.12%	59.12%	53.80%	63.19%	58.52%	56.48%	58.37%
Naïve Bayes	57.92%	57.92%	54.44%	58.84%	59.26%	54.68%	57.18%
Quadratic Discriminant Analysis	18.19%	18.19%	32.45%	74.91%	75.56%	18.19%	39.58%
Random Forest	76.53%	76.53%	72.22%	70.46%	75.65%	78.43%	74.97%
Ridge	56.94%	56.94%	55.42%	56.30%	58.75%	57.92%	57.05%
Support Vector Machine	57.55%	66.20%	58.50%	58.06%	62.04%	58.43%	60.13%

KIMORE Results

39 of 159

Discussion

The tests revealed that the ET, KNN, and RF algorithms are most effective when paired with the ReliefF.
LightGBMMachine, DT, Gradient Boosting, SVM, LR, Quadratic Discriminant Analysis, and Ridge algorithms were found to be most successful when combined with the FCBF
Although ReliefF demonstrated the highest top-1 accuracy on the UI-PRMD dataset when paired with Extra Tree, it exhibited challenges when integrated with other models.

40 of 159

Conclusion

Machine learning requires additional stages of feature extraction and selection.
Non-ensemble models showed high sensitivity depending on the feature ranking technique employed, underscoring the impact of feature selection on classification outcomes.
Ensemble models demonstrated robust performance across multiple datasets, exhibiting low sensitivity to feature ranking methods
The best overal combination is ReliefF-Extra tree scoring 99.94%.
LightGBM have the most consistent results across all ranking techniques with an average of 99%
FCBF works best with ensemble models.
FCBF is scored the highest average across all models 92.65% followed by X²91.37%.

41 of 159

Paper 2: A Framework for Assessing Physical Rehabilitation Exercises

Published at 2023 Intelligent Methods, Systems, and Applications (IMSA), IEEE Conference�DOI

Zaher, M., Samir, A., Ghoneim, A., Abdelhamid, L., & Atia, A. (2023, July). A Framework for Assessing Physical Rehabilitation Exercises. �In 2023 Intelligent Methods, Systems, and Applications (IMSA) (pp. 526-532). IEEE.

42 of 159

43 of 159

Research Pipeline

44 of 159

Datasets

UI-PRMD ^[5]

10 exercises ,10 individuals repeated 10 times
Vicon optical tracker, and a Kinect camera
The data include the motion measurement for 22 joints
Text files

Collected Dataset ^ℹ

3 exercises, 1 individual repeated 7 times

Mini squat – Sit to stand – Straight leg raise

RGB camera
Video Files
Extracted 33 body joints

44

45 of 159

Preprocessing

The Kinect Camera captures 22 body joints are stored in a vector V.

At each time point t,
The three-dimensional coordinates x_t, y_t, and z_t of each joint data J_n.

Face joints were deemed irrelevant for classifying the exercises and were discarded

in our collected dataset

45

46 of 159

Feature Engineering

Feature extraction algorithms are applied to process these joints

5 Statistical techniques are utilized including :�Standard deviation, maximum, median, mean, and minimum
which produced 330 features in total.

FCBF algorithm was employed to rank and select the most significant features.
The model selected the top 20 features and discarded the rest ^ℹ.
It was found that the feature importance score either plateaued or rapidly decreased after the 20th feature.

46

47 of 159

UI-PRMD Dataset Features

WaistX_std
WaistZ_std”
WaistY_std
RightUpperLegZ_std
RightUpperLegZ_mean
LeftCollarX_median
RightLowerLegX_std
RightCollarZ_max
LeftFootX_std
RightFootX_std

RightLegToesZ_std
RightCollarZmin
RightFootZ_min
RightCollarZ_median
RightUpperLegZ_median
LeftForearmY_std
NeckY_median
RightForearmY_min
RightUpperArmY_std
NeckZ_min

47

← back

48 of 159

Experiments

Exp. 1

Was conducted to evaluate the performance of the Extra Trees classifier for action recognition on the UI-PRMD dataset, after applying FCBF feature selection.

Exp. 2.

Was conducted using our proprietary dataset in a real clinic setting to showcase the system's practicality and effectiveness in real time using only RGB Camera.

48

49 of 159

Used Algorithms 1 of 2

Extra Tree

A tree-based ensemble technique used in machine learning.

Extra Trees builds decision trees using the entire dataset.

Trees demonstrate significantly faster performance ^{[9] [10]} .

Instead of the greedy approach used in Random Forest, Extra Trees randomly select split values for features.

49

50 of 159

Used Algorithms 2 of 2

One Dollar

also known as the 1$ gesture recognition algorithm.
This algorithm converts a gesture into a sequence of points and then calculates the minimum distance between the gesture and a set of predefined templates.
The One Dollar algorithm typically uses two-dimensional (x, y)

However, in this study, the joint coordinates were extracted in 3D.
To overcome this, the X and Y coordinates were summed to create a new X value, while the Z coordinate was assigned to Y. Thus, the data in the tuples were represented as (x+y, z) format.

Due to the limited number of videos in the dataset, the One Dollar algorithm was chosen for this research.

50

51 of 159

Experiments 1 of 2

Feature Extraction for 22 body joints.
Ranked and Selected only the top 20 features.
The data was split into 70- 30 for training and testing.
Applied a cross-validation function with 30 folds.

51

52 of 159

Experiments 2 of 2

Feature Extraction for 33 body joints.
Face landmarks were excluded.
(x+y, z)
1, 2 , and 3 videos used for training , 4 for testing.

52

53 of 159

Results

53

Number of Templates	Accuracy	Precision	Recall	F1 Score
One	72%	74.5%	72%	70.4%
Two	88%	93%	88%	87.9%
Three	90%	93.8%	90%	90.1%

Evaluation of 1$ algorithm based on different evaluation metrics for different number of templates used

Algorithm	Accuracy	Precision	Recall	F1 Score
Extra Tree	99.64%	99.74%	99.64%	99.62%
One Dollar	90%	93.8%	90%	90.1%

Evaluation of different algorithms based on different evaluation metrics

54 of 159

Conclusion

Machine learning requires additional stages of feature extraction and selection.
The Selection of these techniques greatly affects the performance of the same model.
The performance of the One Dollar algorithm is greatly affected by the number of templates employed for training, as higher numbers of templates for each exercise lead to better outcomes.
Extra Tree Classifier, outperformed the One Dollar algorithm in all four evaluation metrics
Despite having limited training data, the One Dollar algorithm still generated acceptable results.

54

55 of 159

Confusion Matrix for 1$ with 3 templates

55

56 of 159

Confusion Matrix for 1$ with 3 templates

56

57 of 159

Paper 3: Unlocking the Potential of RNN and CNN Models for Accurate Rehabilitation Exercise Classification on Multi-Datasets.

��Published at Multimedia Tools and Applications Journal (Q1), Springer.

DOI

Zaher, M., Ghoneim, A. S., Abdelhamid, L., & Atia, A. (2024). Unlocking the potential of RNN and CNN models for accurate rehabilitation exercise classification on multi-datasets. Multimedia Tools and Applications, 1-41.

58 of 159

59 of 159

Research Pipeline

60 of 159

Datasets

UI-PRMD

KIMORE

61 of 159

Preprocessing

Camera records the data of 22 body joints, storing this information in a vector denoted as V.
At each time instance t, the representation of each joint data J_n consists of three-dimensional coordinates: X_t, Y_t, and Z_t.
Feature extraction techniques are then applied to process each joint data.
These techniques include mean, median, minimum, maximum, and standard deviation.

62 of 159

For Disease Classification

SMOTE was applied due to the unbalancing of the data.

63 of 159

Hyper-parameters tuning

Deep learning poses a significant challenge in terms of model optimization.
Common techniques are random search and grid search
Grid Search proves more convenient when the Hyper-parameter count is limited, whereas Random Search excels when dealing with a larger number of Hyper-parameters.^[8]
Random Search of 100 Trials was conducted to determine the values of these parameters.

Bergstra J, Bengio Y. Random search for hyper-parameter optimization. Journal of machine learning research. 2012 Feb 1;13(2).

64 of 159

For Disease Classification

SMOTE was applied due to the unbalancing of the data.

65 of 159

Used Algorithms

LSTM
Bi-LSTM
CNN-LSTM
CNN

66 of 159

Random Search

We explored four distinct combinations:

Tuning on the KIMORE dataset and training on the UI-PRMD dataset.
Tuning on the UI-PRMD dataset and training on the KIMORE dataset.
Initial tuning on the KIMORE dataset followed by a subsequent round of Hyper-parameter optimization on the UI-PRMD dataset.
Initial tuning on the UI-PRMD dataset followed by a subsequent round of Hyper-parameter optimization on the KIMORE dataset.

The number of (LSTM) layers, number of (LSTM) units, dropout rate, learning rate (ranging from 0.0001 to 0.01), type of regularizer (l1, l2, or none), and its associated strength.

67 of 159

Parameters : LSTM

Parameter	Range of Values	Best Value
LSTM Layers	From 1 to 7	1
LSTM Units	From 32 to 1024	320
Dropout Rate	From 0.2 to 0.5	0.26
Learning Rate	from 0.0001 to 0.01	0.0005
LSTM Regularizer	l1, l2, or none	l2
Dense Regularizer	l1, l2, or none	None
Dense Layers	From 1 to 5	2
Dense Units	From 64 to 1024	940

68 of 159

Parameters : Bi-LSTM

Parameter	Range of Values	Best Value
BiLSTM Layers	From 1 to 5	2
BiLSTM Units	From 32 to 1024	271
Dropout Rate	From 0.2 to 0.5	0.3
Learning Rate	from 0.0001 to 0.01	0.001014
Regularizer	l1, l2, or none	None
Dense Regularizer	l1, l2, or none	None
Dense Layers	From 1 to 5	4
Dense Units	From 64 to 1024	927

69 of 159

Parameters : CNN-LSTM

Parameter	Range of Values	Best Value
Filters	From 32 to 1024	128
Kernel Size	From 3 to 10	8
LSTM Units	From 64 to 1024	256
Dropout Rate	From 0.2 to 0.5	0.2
Learning Rate	from 0.0001 to 0.01	0.00077
Conv Regularizer	l1, l2, or none	l2
LSTM Regularizer	l1, l2, or none	None
Dense Regularizer	l1, l2, or none	None
Dense Layers	From 1 to 5	2
Dense Units	From 64 to 1024	927

70 of 159

Parameters : CNN

Parameter	Range of Values	Best Value
Convolutional Layers	From 1 to 6	2
Conv Units	From 32 to 512	48
Dropout Rate	From 0.2 to 0.5	0.2
Learning Rate	from 0.0001 to 0.01	0.0025284
Conv Regularizer	l1, l2, or none	None
Dense Regularizer	l1, l2, or none	None
Dense Layers	From 1 to 5	1
Dense Units	From 64 to 1024	544

71 of 159

Evaluation Metrics

Loss
Accuracy
Precision
Recall
F1-Score

72 of 159

Training

All experiments were run on the same machine with 15GB of GPU.
Early Stop and Reduce Learning Rate On Plateau are utilized.

Parameter	Value
Epochs	450
Train-test split	80 20
Train-val split	80 20
Folds	5
Batch Size	32
Hidden Layer Activation	ReLU
Output Layer Activation	Softmax
Optimizer	Adam
Loss	Categorical cross-entropy

73 of 159

74 of 159

Experiments

The first experiment was conducted to find the best exercise classification algorithm across both datasets.

The second experiment was conducted to classify different diseases from patients while performing the same five exercises in the KIMORE dataset.

75 of 159

Exercise Classification

UIPRMD

KIMORE

76 of 159

State-of-the-art - KIMORE

CNN scored:

93.08% accuracy (top)
93.07% precision (top)
93.96% recall (top)
91.79% f1-score (top)
0.2860 loss (least)
2.3 minutes training time (least)

0.75% performance increase.

Method	Accuracy
Ensemble-based Graph Convolutional Network (EGCN)	80.10%
3D Convolution Neural Network (3D-CNN)	90.57%
Many-to-Many model with density map output	92.33%
Our Tuned-CNN	93.08%

77 of 159

State-of-the-art - UIPRMD

CNN scored:

99.70% accuracy (top)
99.70% precision (top)
99.75% recall (top)
99.70 % f1-score (top)
0.0122 loss (least)
2.15 minutes training time (least)

0.01% performance increase over ML with only 20 features.

Method	Accuracy
Ensemble-based Graph Convolutional Network (EGCN)	86.90%
Graph Convolutional Siamese Network	99.20%
FCBF - Extra Tree	99.60%
Our Tuned-CNN	99.70%

78 of 159

Disease Classification

79 of 159

Disease Classification Results

CNN scored:

89.87% accuracy (top)
88.91% precision (top)
90.63% recall (top)
89.49 % f1-score (top)
0.48286 loss (least)
7.32 minutes training time

80 of 159

Conclusion

The CNN model exhibited outstanding accuracy, attaining scores of 93.08% and 99.7% on the KIMORE and UI-PRMD datasets, respectively.
This surpasses the state-of-the-art on both datasets by 0.75 and 0.1%, respectively.
Moreover, the model demonstrated notable proficiency in disease classification, enabling the detection of correct and incorrect exercise techniques and achieving a disease diagnosis accuracy of 89.87%.
Deep Learning Models requires extensive data and training iterations.
The proposed CNN model required the least amount of iterations while achieving the best performance.

81 of 159

Paper 4: Fusing CNN and Attention Mechanisms: Advancements in Real-Time Human Activity Recognition�for Rehabilitation Exercises Classification

�Submitted to Computers in Biology and Medicine (Q1), Elsevier

82 of 159

Research Pipeline

83 of 159

Datasets

UI-PRMD

KIMORE

84 of 159

Preprocessing

Camera records the data of 22 body joints, storing this information in a vector denoted as V.
At each time instance t, the representation of each joint data J_n consists of three-dimensional coordinates: X_t, Y_t, and Z_t.
Feature extraction techniques are then applied to process each joint data.
These techniques include mean, median, minimum, maximum, and standard deviation.

85 of 159

Scalogram Generation

Mel-frequency cepstral coefficients

Continuous Wavelet Transform

86 of 159

Image Representation: CWT

C𝑊𝑇_𝐗𝐘𝐙(𝑎, 𝑏) signifies the Continuous Wavelet Transform
Applied to the body joint coordinates 𝐗𝐘𝐙(𝑡)
Considering scale 𝑎 and shift 𝑏.
The mother wavelet,𝜓(𝑡), is a function with specific localization in time and frequency domains.
Integration over all time is indicated by ∫
While √|𝑎| ensures normalization of the transform based on�the scale’s absolute value.

87 of 159

Image Representation

88 of 159

Limitations of Machine Learning

Machine learning requires additional stages of feature extraction and selection.
The selection of these techniques greatly affects the performance of the same model.

89 of 159

Limitations of Deep Learning

Despite the remarkable achievements of deep learning (DL) in Human Activity Recognition (HAR)

A primary concern is the substantial volume of labeled data required to effectively train deep neural networks.
This dependency on extensive datasets can pose challenges in environments where data collection is difficult or privacy concerns are paramount such as healthcare.
Furthermore, the computational complexity and resource demands of training deep learning models are significantly high.

Transfer learning (TL) emerges as a powerful solution to the deep learning (DL) challenge of requiring vast datasets for training.

90 of 159

Transfer Learning

Models trained on large datasets to solve a specific problem.
Available for use 'out-of-the-box' for similar tasks, often covering domains like image recognition, natural language processing, etc.

Popular Pre-trained Models

Image Processing: ResNet, Inception, and MobileNet.

91 of 159

Fusion

One can leverage the strengths and mitigate the weaknesses of individual models, thereby achieving better performance than any single model could on its own.

92 of 159

Algorithms

CNN-Based
Attention-Based
Fused

CNN

Attention

93 of 159

Fused Algorithms Models

2 Models

CNN – ViT
VGG16 – ViT
DenseNet121 – ViT
DenseNet201 – ViT
ResNet50 – ViT
ResNet101 – ViT
MobileNetV2 – ViT
MobileNetV3Large – ViT
MobileNetV3Small – ViT

3 Models

ResNet50 – MobileNetV3Small – ViT
Res101-Dense201-ViT
DenseNet201-MobileNetV3Small-ViT

94 of 159

Model Architectures

95 of 159

Top Layer Archiecture

Random Search was applied to determine NN architecture.
100 trials was executed on MobileNetV3Small

Due to it’s efficient training time

Among all 100 trials, 2 architectures were favorable

96 of 159

Two Architectures

Parameter	First Architecture	Second Architecture
Number of Dense Layers	2	2
1^st Dense Layer’s Units	512	352
2^nd Dense Layer’s Units	265	176
Dropout Rate	0.2	0.2

97 of 159

Evaluation Metrics

Optimization Metrics�

Loss
Accuracy (%)
Precision (%)
Recall (%)
F1-Score (%)

Satisfactory Metrics

Training Time (S)
Testing Time (S)
Number of Epochs
Model Size (MB)

98 of 159

Training

All experiments were run on the same machine with 15GB of GPU and 12GB of RAM.
Early Stop and Reduce Learning Rate On Plateau are utilized.

Parameter	Value
Epochs	150
Train-test split	80 20
Train-val split	80 20
Folds	5
Batch Size	64
Input SHape	224x224x3
Hidden Layer Activation	ReLU
Output Layer Activation	Softmax
Optimizer	Adam
Loss	Categorical cross-entropy

99 of 159

Experiments

The first experiment aimed to identify the optimal top-layer architecture for achieving the most accurate exercise classification results on the UI-PRMD dataset.

Simultaneously, the second experiment focused on implementing numerous hybrid CNN-ViT architectures, validating the results through cross-validation and incorporating an additional dataset.

100 of 159

For Disease Classification

SMOTE was applied due to the unbalancing of the data.

101 of 159

Exp 1 Results (CNN-Based Only)

Comparison of the best two architectures for the Fully Connected Network architecture on the UI-PRMD dataset, focusing on CNN-based models. The 2D bar chart visually represents the accuracy distinctions between the two architectures.

102 of 159

Exp 1 Results (Attention-Based)

Comparison of the best two architectures for the Fully Connected Network architecture on the UI-PRMD dataset, focusing on Attention-based models. The 2D bar chart visually represents the accuracy distinctions between the two architectures.

103 of 159

All Model Results on UIPRMD

Comparison of results across various algorithms using the second architecture on the UIPRMD dataset. The image depicts the performance of 20 different CNN-Based Models alongside 13 attention-based and fused models.

104 of 159

All Model Results on UIPRMD

105 of 159

Cross Validation on UI-PRMD

106 of 159

Comparison with state-of-the-art on UI-PRMD

Method	Accuracy	F1-Score	Results
GCN	92.64%		Training
ST-GCN	98.90%		Training
2S-AGCN	99.10%		Training
Graph Convolutional Siamese Network	99.20%		Training
Spike Train	77%		Testing
Graph Transformer		85%	Testing
EGCN	86.90%		Testing
Res50-MobileV3Small-ViT	89.30%	89.07%	Testing
DenseNet121	89.33%	89.06%	Testing
Res50-Dense201-ViT	89.80%	89.59%	Testing
DenseNet201-MobileNetV3Small-ViT	89.80%	89.64%	Testing

107 of 159

108 of 159

All Model Results on KIMORE

Comparison of results across various algorithms using the second architecture on the KIMORE dataset. The image depicts the performance of 20 different CNN-Based Models alongside 13 attention-based and fused models.

109 of 159

All Model Results on KIMORE

110 of 159

Cross Validation on KIMORE

111 of 159

Comparison with state-of-the-art on KIMORE

Algorithm	Accuracy
EGCN	80.10%
3D-CNN	90.57%
Many-to-Many model with density map output	92.33%
Res50-Dense201-ViT	93.78%
Res50-MobileV3Small-ViT	94.04%
Dense201-MobileV3Small-ViT	94.30%
ViT	95.08%
MobileNetV3Small-ViT	95.33%

112 of 159

113 of 159

Comparison of inference time

114 of 159

Comparison of model size in MB

115 of 159

Discussion

The Continuous Wavelet Transform (CWT) has demonstrated its superiority over alternative image representation techniques.
The choice of deep learning model for exercise classification necessitates a consideration of trade-offs between model complexity and computational efficiency.

DenseNet201 and ResNet101, exhibit reliable efficacy across a variety of datasets.
CNN-based frameworks necessitate a substantially lower number of epochs for training in comparison to attention-based architectures.

The 3-model architecture outperforms both single and 2-model architectures.
The increased model complexity translates to challenges in deploying it on resource-constrained devices or in real-time applications

Cloud-based deployment strategies can be utilized by deploying an API created with flask.

116 of 159

Discussion

Attention-based models, especially the Vision Transformer, showcase impressive performance on specific datasets while encountering challenges with others, mirroring a similar pattern observed in CNN-based models.
DenseNet201 and ResNet101, exhibit reliable efficacy across a variety of datasets. It is noteworthy that these CNN-based frameworks necessitate a substantially lower number of epochs for training in comparison to attention-based architectures.
Despite the fused models not emerging as the top performers on either dataset, its consistent placement within the top models on both datasets is a remarkable achievement, underscoring its capacity for generalization.

117 of 159

Conclusion

The study compared 20 CNN-based models, 1 attention-based, 13 fused models with 2 network architectures applying a 5-fold cross-validation. (training of +290 models)
In comparison, CNN-Based models emerge as a favorable choice in scenarios where file size is a critical consideration. Models like ResNet101 and DenseNet201, offering smaller file sizes, become preferable choices in such contexts.
For those prioritizing testing time, MobileNetV3Small stands out as a viable option, although its performance can significantly vary depending on the dataset in use. Fused models strike a balance between performance and testing time, albeit at the expense of a larger model size.

118 of 159

Discussion

119 of 159

Discussion

1$ algorithm is suitable when there’s a small amount of data (our collected dataset). The performance of the 1$ is increased with the increase of number of templates.
The ensembled model Extra Tree outperformed all other classical machine learning model. Moreover, The FCBF stands out as the most suitable feature ranking technique.
CWT outperformed other signal processing techniques such as MFCC and MEL.
After converting the time-series data into 2D image, The fused Attention-CNN model outperfomed other single model approaches with tri-model architecture outperfoming both uni-model and dual-model architectures.
1D CNN achieved the state-of-the-art results on both benchmarking dataset.

120 of 159

Discussion

While using ML models provided a solid performance across both datasets, It required additional processing for feature extraction and selection. Moreover, it only uses 20 features of the original dataset to yield this performance
Attention model require more computational resources than 2D CNN-based models. However, the single model architecture didn’t yield a consisent results on multiple dataset.
Fused models showed consistent results across multiple dataset at the cost of computational power and an acceptable inference time.
1D-CNN model showed solid performance and can be used for low-power devices.

121 of 159

SE Prespective

Research stakeholders include experts from Faculty of Physical Therapy.

Defining requirements
Providing feedback

Project can be productized.

122 of 159

Collaboration With Physical Therapy

September 2022

October 2023

123 of 159

Patient Dashboard

124 of 159

Conclusion

125 of 159

Conclusion

This reseach investigated increasing the productivity of phyiscal therapists by monitoring more patients at the same time by the use of various HAR techniques.
The study investigated using kinect and a more affordable option of using RGB cameras.
A case study was also conducted by collecting a real-world dataset in the university clinics.
Various learning techniques were experimented to achieve �state-of-the-art results on multiple benchmarking datasets.

126 of 159

Future Work

Improve the framework to detect multiple human poses per frame from RGB cameras without compromising processing speed (frames per second).
Extend the current work to include Augmented Reality (AR) and Virtual Reality (VR) technologies to promote more effective home-based rehabilitation.
Develop techniques to compress the models for efficient operation on mobile devices.
Create an automated recommendation engine capable of personalizing treatment plans and exercise regimens for individual patients.
Conduct further research into disease classification and diagnosis, necessitating the acquisition of larger datasets for improved accuracy.

127 of 159

128 of 159

UI-PRMD

129 of 159

UI-PRMD

130 of 159

Collected Dataset

131 of 159

Collected Dataset Classes

131

Exercise	Incorrect Tempate
Mini Squat	Uncontrolled Knee Position
Mini Squat	Excessive Trunk Flexion
Sit-to-Stand	Uncontrolled Knee Position
Sit-to-Stand	Excessive Trunk Flexion
Straight Leg Raising	Knee Flexion
	Ankle Planter Flexion + Knee Flexion
	Ankle Planter Flexion + Knee Slight Flexion

← back

132 of 159

Collected Dataset : Extracting Joints

132

133 of 159

Body-joint extractor

OpenPose and Blaze Pose

Blaze Pose can process more frames per seconds

Mediapipe is a lightweight framework developed by google built upon Blaze Pose model.

134 of 159

One Dollar

135 of 159

Extra Tree

A variant of the Random Forest algorithm.
The main difference lies in how the trees are built.

In Random Forests, each tree is constructed based on the best split among a random subset of features.
In Extra Trees, every feature's random split is considered, and the best split among all random splits is used for each node in the tree. This additional randomization makes Extra Trees even more robust against overfitting and can lead to further variance reduction.

135

136 of 159

Extra Tree

Extra Trees add an extra layer of randomness during tree construction, which can lead to better generalization and improved performance in some cases.

136

137 of 159

LightGBM

Gradient boosting is an ensemble learning technique where multiple weak learners (usually decision trees) are trained sequentially.

Light GBM employs a leaf-wise tree growth strategy instead of depth-first.

137

138 of 159

FCBF

Fast Correlation-Based Filter:

A feature selection algorithm used to identify the most relevant and informative features from a dataset.
FCBF aims to reduce the dimensionality of the data by selecting a subset of features that are highly correlated with the target variable while having low intercorrelations among themselves.

138

139 of 159

CNN

140 of 159

1D CNN

141 of 159

Benefits of Transfer Learning

Time Efficiency
Data Efficiency
Improved Performance

142 of 159

Modifying and Fine-Tuning Pre-trained Models

Customizing the Model for Specific Tasks

Replacing Trainable Layers

Fine-Tuning Strategies

Layer Freezing.
Selective Training

143 of 159

144 of 159

VIT

145 of 159

Residual Models

146 of 159

Dense Models

147 of 159

Mob

148 of 159

Acknowledgments

149 of 159

Acknowledgments

Dr. Ayman Ezzat
Dr. Amr Ghoniem
Dr. Laila Abdelhamid

I am deeply grateful for the support and guidance provided by my supervisors:

Raghda Essam and Noha Ahmed
Family
Friends

A special thank you to:

150 of 159

References

W. H. Organization, “Rehabilitation.” Available online, 2023. Accessed on June 15, 2023.
Arshad, M. H., Bilal, M., & Gani, A. (2022). Human activity recognition: Review, taxonomy and open challenges. Sensors, 22(17), 6463.
Debnath, B., O’brien, M., Yamaguchi, M., & Behera, A. (2022). A review of computer vision-based approaches for physical rehabilitation and assessment. Multimedia Systems, 28(1), 209-239.
Tasnim, N., & Baek, J. H. (2023). Dynamic edge convolutional neural network for skeleton-based human action recognition. Sensors, 23(2), 778.
Yue, R., Tian, Z., & Du, S. (2022). Action recognition based on RGB and skeleton data sets: A survey. Neurocomputing, 512, 287-306.
Rashid, F. A. N., Suriani, N. S., Mohd, M. N., Tomari, M. R., Zakaria, W. N. W., & Nazari, A. (2020). Deep convolutional network approach in spike train analysis of physiotherapy movements. In Advances in Electronics Engineering: Proceedings of the ICCEE 2019, Kuala Lumpur, Malaysia (pp. 159-170). Springer Singapore.
A. Vakanski, H.-P. Jun, D. Paul, and R. Baker, “A data set of human body movements for physical rehabilitation exercises,” Data (Basel), vol. 3, Mar. 2018.
Capecci, M., Ceravolo, M. G., Ferracuti, F., Iarlori, S., Monteriu, A., Romeo, L., & Verdini, F. (2019). The kimore dataset: Kinematic assessment of movement and clinical scores for remote monitoring of physical rehabilitation. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 27(7), 1436-1448.
M. Fern ́andez-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classifiers to solve real world classification problems?,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3133– 3181, 2014.
R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd international conference on Machine learning, pp. 161–168, ACM, June 2006.

150

16/07/2023

151 of 159

Proposed Framework

151

System Overview

152 of 159

Proposed System : Overview

153 of 159

Proposed System : Technical

154 of 159

155 of 159

Proposed System : Track 1 - Classical

156 of 159

Proposed System : Track 2 – Deep Learning

157 of 159

Research plan

Survey the existing machine learning and deep learning algorithms .

Phase 1

Assessing the weaknesses and strength of the existing models .

Phase 2

Machine learning algorithms in order to support data cleaning of imbalanced data .

Phase 3

Machine Learning or Deep Learning algorithm that achieves the highest accuracy to detect right/wrong techniques.

Phase 4

Applying and assessing the proposed algorithm on real Dataset

Phase 5

158 of 159

Reference

W. H. Organization, “Rehabilitation.” Available online, 2023. Accessed on June 15, 2023.
B. Debnath, M. O’Brien, M. Yamaguchi, et al., “A review of computer vision-based approaches for physical rehabilitation and assessment,” Multimedia Systems, vol. 28, no. 2, pp. 209–239, 2022
F. A. Rashid, N. S. Suriani, M. N. Mohd, M. R. Tomari, W. N. W. Zakaria, and A. Nazari, “Deep convolutional network approach in spike train analysis of physiotherapy movements,” in Advances in Electronics Engineering: Proceedings of the ICCEE 2019, Kuala Lumpur, Malaysia, pp. 159–170, Springer Singapore, 2020
N. Tasnim and J.-H. Baek, “Dynamic edge convolutional neural network for skeleton-based human action recognition,” Sensors, vol. 23, no. 2,p. 778, 2023.
A. Vakanski, H.-P. Jun, D. Paul, and R. Baker, “A data set of human body movements for physical rehabilitation exercises,” Data (Basel), vol. 3, Mar. 2018.
M. Fern ́andez-Delgado, E. Cernadas, S. Barro, and D. Amorim, “Do we need hundreds of classifiers to solve real world classification problems?,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 3133– 3181, 2014.
R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd international conference on Machine learning, pp. 161–168, ACM, June 2006.

158

16/07/2023

159 of 159

Reference

“Rehabilitation.” World Health Organization. https://www.who.int/health-topics/rehabilitation#tab=tab_1 / (accessed Sep. 11, 2022).
“Rehabilitation Facts” World Health Organization . https://www.who.int/news-room/fact-sheets/detail/rehabilitation (accessed Sep. 11 , 2022)
Gu Y, Pandit S, Saraee E, Nordahl T, Ellis T, Betke M. Home-based physical therapy with an interactive computer vision system. InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops 2019 (pp. 0-0).
Rivas JJ, del Carmen Lara M, Castrejon L, Hernandez-Franco J, Orihuela-Espina F, Palafox L, Williams A, Bianchi-Berthouze N, Sucar LE. Multi-label and multimodal classifier for affective states recognition in virtual rehabilitation. IEEE Transactions on Affective Computing. 2021 Feb 1;13(3):1183-94.
Debnath B, O’brien M, Yamaguchi M, Behera A. A review of computer vision-based approaches for physical rehabilitation and assessment. Multimedia Systems. 2021 Jun 19:1-31.
Chambers C, Seethapathi N, Saluja R, Loeb H, Pierce SR, Bogen DK, Prosser L, Johnson MJ, Kording KP. Computer vision to automatically assess infant neuromotor risk. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2020 Oct 6;28(11):2431-42.
Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2D pose estimation using part affinity fields,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1–9.
Sucar LE, Luis R, Leder R, Hernández J, Sánchez I. Gesture therapy: A vision-based system for upper extremity stroke rehabilitation. In2010 Annual International Conference of the IEEE Engineering in Medicine and Biology 2010 Aug 31 (pp. 3690-3693). IEEE.
Lin, T.-Y., Hsieh, C.-H., Lee, J.-D.: A kinect-based system for physical rehabilitation: Utilizing tai chi exercises to improve movement disorders in patients with balance ability. In:Modelling Symposium (AMS), 2013 7th Asia, pp. 149–153. IEEE (2013)