6 of 53

3. Problem Statement

Due to the inaccuracy of the results in manual correction, length of time, effort and increased costs, the correction is done by human, and everyone has a point of view in correcting the essay questions, one paper is evaluated by more than one person to ensure that there is no bias.

So, There is a need for an automated essay grading scoring system that reduces cost, time and determines an accurate and reliable score.
This assignment actually requires a smart system that helps electronically � correct papers for essay questions.

The problem statement is:
1- How to extract the answer from the text written by the student (knowledge). 2- How to automatically evaluate the answer knowledge and assign a grade.

7 of 53

4- Objectives

Construct an accurate model with the following characteristics:

Extract the answer from the text written by the student (knowledge).
Automatically evaluate the answer knowledge and assign a grade Using �deep learning technique.

8 of 53

5- Background

Automated Grading for Essays (AGE)

is the use of specialized computer algorithms to assign grades to essays written in an educational setting such as:

Deep Learning.

Natural Language Processing (NLP)

Recurrent Neural Network (RNN).

LSTM ( Long short-Term Memory Network)

9 of 53

5- Background

1. Deep Learning:

An AI function that mimics the workings of the human brain in processing data for use in detecting objects, recognizing speech, translating languages, and making decisions.
It defined as a cascade of layers performing nonlinear processing to learn multiple levels of data presentations. Its goal is to speed up the learning period

Unlike conventional machine learning and data mining techniques, deep learning is able to generate very high–level data representations from massive volumes of raw data. So, it have provided a solution to many real world applications.

10 of 53

5- Background

skin cancer detection

Deep Learning applications:

AGE systems

11 of 53

5- Background

2. Natural Language Processing (NLP)

NLP is a series of algorithms and techniques that mainly focus on teaching computers to under-stand the human language. Some NLP tasks include document classification, translation, paraphrase identification, text similarity, summarization, and question answering.

NLP development is challenging due to the complexity and ambiguous structure of the human language. Moreover, natural language is highly context specific, where literal meanings change based on the form of words, and domain specificity.

Most NLP models follow a similar preprocessing step: (1) the input text is broken down into words through tokenization and then (2) these words are reproduced in the form of vectors, or n-grams. Representing words in a low dimension is important to create an accurate perception of similarities and differences between various words. The challenge arrives when there is a need to decide the length of words contained in each n-gram. This procedure is context specific and requires prior domain knowledge.

12 of 53

5- Background

2. NLP Approaches related to AGE :

Some of the highly impactful approaches in solving the most well-known NLP tasks are:

-- Paraphrase Identification:

-- Paraphrase identification is the process of analyzing two sentences and projecting how similar � they are based on their underlying hidden semantics.

-- It is a key feature that is beneficial for several NLP jobs such as plagiarism detection, answers � to questions, context detection, summarization, and domain identification.

-- Question Answering:

-- An automatic question-and-answering system should be able to interpret a natural language question and use reasoning to return an appropriate reply. Modern knowledge bases, such as the famous FREEBASE dataset, allow this field to flourish and leap out of the times when features and rule sets were hand-crafted to specific domains.

13 of 53

5- Background

3. Recurrent Neural Network (RNN):

It is another widely used and popular algorithm in deep learning, especially in NLP and speech processing.

Unlike traditional neural networks, RNN utilizes the sequential information in the network. This property is essential in many applications where the embedded structure in the data sequence conveys useful knowledge.
For example, to understand a word in a sentence, it is necessary to know the context. Therefore, an RNN can be seen as short-term memory units that include the input layer x, hidden (state) layers, and output layer y.

Three deep RNN approaches including deep “Input-to-Hidden,” “Hidden-to-Output,” and “Hidden-to-Hidden” are introduced. Based on these three solutions, a deep RNN is proposed that not only takes advantage of a deeper RNN but also reduces the difficult learning in deep networks.

14 of 53

5- Background

RNN approach automatically learns the relation between an essay and its grade.

Since the system is based on RNNs, it can use non-linear neural layers to identify complex patterns in the data and learn them, and encode all the information required for essay evaluation and scoring

15 of 53

5- Background

4. LSTM ( Long short-Term Memory Network) :

LSTM forms a memory about a sequence of inputs, over time. It is an artificial recurrent neural network (RNN) architecture used in the field of deep learning.

It is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition

Example: LSTM can be trained to generate new character, words, and bodies of text

16 of 53

6- Related Work

PAPER INFORMATION	OBJECTIVES	METHODOLOGY	DATASET	RESULTS	CONTRIBUTION	MISSING
1- Automated Grading of Essays: A Review Authors: Jyoti G. Borade, Publisher: SPRINGER, 2021	methods for automated grading of essays and Evaluating explanatory answers (applications of approaches such as Natural Language Processing and Deep Learning for AGE	1-Textual Similarity 2- Latent Semantic-Based Vector Space Model 3- Neural Network Based Approaches 4 - Naive Bayes Classifiers	responses for each question from different students.	Accuracy obtained for simple LSTM is 83, Deep LSTM is 82, Bi-directional LSTM is 89	This work presents a review of machine learning techniques used to assess essay type of answers	Limited Data set

17 of 53

6- Related Work

PAPER INFORMATION	OBJECTIVES	METHODOLOGY	DATASET	RESULTS	CONTRIBUTION	MISSING
2- Automated Content Grading Using Machine Learning Authors: Rahul K Chauhan, Ravinder Saharan, Siddhartha Singh, Priti Sharma Publisher: ResearchGate, 2020	algorithmic approach in machine learning can be used to automatically examine and grade theoretical content in exam answer papers.	Random Forest Algorithm	The standard exam answer papers were taken from the mid-term (minor) examinations	Weighed Kappa Set1: 0.46 Set 2:0.55 Set 3:0.61	This work represented how content grading in a big-data based technical domain can be solved using this approach	Low Accuracy

18 of 53

6- Related Work

PAPER INFORMATION	OBJECTIVES	METHODOLOGY	DATASET	RESULTS	CONTRIBUTION	MISSING
3- An Analysis of Automated Answer Evaluation Systems based on Machine Learning Authors: Birpal Singh J. Kapoor, 1Shubham M. Nagpure Publisher: IEEE, 2020	summarize the existing mechanism and analyses the performance of the system used for automatic grading of the long and descriptive answers.	A methodology for detaching a course of action of expositions into subsets that address similar graders, which uses an explanation reasoning and bunching.	Kaggle website	many of analysts and scholars are as yet working extremely hard and created different frameworks, that gave empowering results.	features developed with the corpus based strategies, or NLP systems as a significant section of AI structure.	Percentage of accuracy is not specified

19 of 53

6- Related Work

PAPER INFORMATION	OBJECTIVES	METHODOLOGY	DATASET	RESULTS	CONTRIBUTION	MISSING
4- Automated language essay scoring systems: a literature review Authors: Mohamed Abdellatif Hussein1, Hesham Hassan Publisher: Peerj , 2019	review the literature for the AES systems used for grading the essay questions	1- Project Essay Grader (PEG) 2-Intelligent Essay Assessor (IEA) 3- E-rater 4- Criterion 5-IntelliMetric 6-MY Access 7-Bayesian Essay Test Scoring System (BETSY) 8- Automatic text scoring using neural networks 9- A neural network approach to automated essay scoring 10- neural network for automatic essay scoring	the Kaggle’s ASAP contest dataset	PEG 0.87 IEA 0.90 E-rater 0.91 IntelliMetric 0.83 BETSY 0.80	The performance of these systems is evaluated based on the comparison of the scores assigned to a set of essays scored by expert humans.	The results are not specific to each method

20 of 53

6- Related Work

PAPER INFORMATION	OBJECTIVES	METHODOLOGY	DATASET	RESULTS	CONTRIBUTION	MISSING
5- Automated Essay Grading using Machine Learning Algorithm Authors: Ramalingam ,APandian,Prateek Chetry and Himanshu Nigam Publisher: IOP , 2018	develop an automated essay assessment system by use of machine learning techniques	machine learning technique, e-Rater technique, provided as input and then it is compared with the essays of each set once it is done then the essays are compared based on their polarities, words used and the content of essay. The machine finally generates a score for the essay by combining all the results to get a final-score.	The dataset used has been extracted from kaggle.com, it consists of the data from the competition conducted by The Hewlett Foundation	the machine is capable of assessing an essay like a human rater.	This current approach tries to model the language features like language fluency, grammatical correctness, domain information content of the essays, and put an effort to fit the best polynomial in the feature space using linear regression with polynomial basis functions	accuracy percentage is not specified

21 of 53

6- Related Work

PAPER INFORMATION	OBJECTIVES	METHODOLOGY	DATASET	RESULTS	CONTRIBUTION	MISSING
6- Intelligent Auto-grading System Authors: Zining Wang1, Jianli Liu1 Publisher: Proceedings,2018	present a novel automatic essay scoring system based on Natural Language Processing and Deep Learning technologies	Natural Language Processing, Deep Learning , LSTM	The dataset for training and testing is the public essay set available in the Automated Student Assessment Prize on Kaggle	Accuracy 73%	the NLP, Neural Network and intelligent auto-grading system and then attempt to build an innovative open-minded response grading machine.	Accuracy is low

22 of 53

7- Related Work Conclusion & Research Gap

2- Deep Learning

1- Common used techniques

1- Machine learning

3- NLP for AGE

support vector machine (SVM)
Random Forest
Latent semantic analysis (LSA)

N Layer neural network
It used to implement a scoring function

It used to handle linguistic issues such as multiple meanings of words in different contents.
It helps to extract a linguistic features.

23 of 53

2- Different Models for AGE

3- Neural Network based Approaches��

4- Naïve Bayes Classifiers�

1- Machine Learning { linear regression, clustering, SVM, and Bayesian inference}

is an extension of the mean squared error. Importantly, the square root of the error is calculated, which means that the units of the RMSE are the same as the original units of the target value that is being predicted.

2- Text Similarity :

It takes text documents as inputs and finds similarity between them.

Lexical similarity: determines the similarity by matching contents, word-by-word.

Semantic similarity: is based on the meaning of the contents.

24 of 53

3- AGE Evaluation Metrics:�

Root Mean Squared Error (RMSE):

is an extension of the mean squared error. Importantly, the square root of the � error is calculated, which means that the units of the RMSE are the same as the � original units of the target value that is being predicted.

Quadratic weighted Kappa (QWK):

A weighted Kappa is one of the evaluation metrics, used to calculate the amount of similarity between prediction and actual evaluation.
It generates a score of 1.0 when prediction and actual evaluation is same.

25 of 53

4- Research Gap:

Most algorithms that use Natural Language Processing, Deep � Learning are get low accuracy.

A need for bridging this gap (improve the accuracy)

26 of 53

8-Research Road Map

8.1- Re Implementation of Kaggle AGE Model using colab platform

8.2- AGE Model Evaluation

8.3- The Enhanced AGE model using RNN and LSTM Methodology�8.4- The enhanced AGE model Evaluation

8.5- Results of comparing the enhanced model to the original model

27 of 53

8.1 Re Implementation of Kaggle AGE Model using Kaggle platform

The existing AGE from kaggle

dataset

CSV

File reader libraries in python

AGE Model

question score

28 of 53

8.1 Re Implementation of Kaggle AGE Model using colab platform

The framework is starting from collecting the data in one dataset with a CSV format file . Then, we are going to use the COLAB editor for python to prepare our experiment.
The python library keras learn is used to implement the machine learning model that will be used in forecasting.
Finally, the kappa metrics are used for machine learning model assessment and result presentation.

dataset

CSV

File reader libraries in python

Model Building

question score

29 of 53

Model building steps

Words that do not have the meaning of stop words are removed.
Then the topic is converted into a group of sentences and each sentence is converted into a list of words.
Then the representation and embedding meaning of each word in each sentence is obtained and an average vector is calculated to represent the subject.
Each subject's beam is grouped into a list and entered into the model for training.

30 of 53

Model building steps

31 of 53

Dataset:

kaggle website: https://www.kaggle.com/c/asap-aes

Data Description:
using a data set of ~13000 essays These essays were divided into 8 different sets based on context
For each of the 8 questions, a number of articles are explained in the following table and chart

32 of 53

Dataset:

Distribution sets of Essays:

QUESTION NUMBER	COUNT	PERCENTAGE
1	1783	13.7%
2	1800	13.9%
3	1726	13.3%
4	1770	13.6%
5	1805	13.9%
6	1800	13.9%
7	1569	12.1%
8	723	5.57%
Total	12976	100%

33 of 53

Dataset:

Q1: Write a letter to your local newspaper in which you state your opinion on the effects computers have on people. Persuade the readers to agree with you.

TYPE OF ESSAY	TRAINING SET SIZE
Persuasive/ Narrative/Expository	1,783 Essays

34 of 53

Dataset:

SCORE	RUBRIC GUIDELINES
1	An undeveloped response that may take a position but offers no more than very minimal support. Typical elements: Contains few or vague details. Is awkward and fragmented. May be difficult to read and understand. May show no awareness of audience.
2	An under-developed response that may or may not take a position. Typical elements: Contains only general reasons with unelaborated and/or list-like details. Shows little or no evidence of organization. May be awkward and confused or simplistic. May show little awareness of audience.
3	A minimally-developed response that may take a position, but with inadequate support and details. Typical elements: Has reasons with minimal elaboration and more general than specific details. Shows some organization. May be awkward in parts with few transitions. Shows some awareness of audience.
4	A somewhat-developed response that takes a position and provides adequate support. Typical elements: Has adequately elaborated reasons with a mix of general and specific details. Shows satisfactory organization. May be somewhat fluent with some transitional language. Shows adequate awareness of audience.
5	A developed response that takes a clear position and provides reasonably persuasive support. Typical elements: Has moderately well elaborated reasons with mostly specific details. Exhibits generally strong organization. May be moderately fluent with transitional language throughout. May show a consistent awareness of audience.
6	A well-developed response that takes a clear and thoughtful position and provides persuasive support. Typical elements: Has fully elaborated reasons with specific details. Exhibits strong organization. Is fluent and uses sophisticated transitional language. May show a heightened awareness of audience.

35 of 53

Dataset:

Data Split:
Use 5 folds cross validation of the model accuracy, which means that in each fold 80% of the training data and 20% of the test data were chosen
Types of Essays :
50% ( persuasive / narrative / expository)
50% (source dependent responses)

36 of 53

8.2- AGE Model Evaluation

QWK:(Quadratic weighted Kappa):
Quadratic Weighted Kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories. The definition of K is

Where O is a matrix of size n-by-n corresponds to n essays. O(i, j) gives the count of essays that obtained a score i by the first evaluator and score j by the second evaluator. E matrix gives expected ratings without considering any correlation between the two evaluations given by two different evaluators. W is a matrix of the same size as the O and E matrix. It is calculated as follows:

37 of 53

8.2- AGE model evaluation

Po is the relative observed agreement among raters

Pe is the hypothetical probability of chance agreement

using the observed data to calculate the probabilities of each observer randomly seeing each category. If the raters are in complete agreement then K=1, If there is no agreement among the raters other than what would be expected by chance (as given by pe), K=0

38 of 53

8.2- AGE model evaluation

MODEL	BATCH SIZE	EPOCHS	LOSS	KAPPA SCORE ( Accuracy )
LSTM Kaggle	64	2	40.79	0.7351
LSTM Colab	64	2	40.79	0.7351

The re-implemented model shows that the percentage the accuracy 73% is the same as mention in related work number #6.

39 of 53

8.3- Proposed Enhanced AGE Model using RNN � and LSTM Methodology

Enhanced Model Building using Deep learning and RNN and LSTM

Training network

Model Evaluation

Running the model

dataset

40 of 53

THE PROPOSED MODEL

41 of 53

Model parameters

Number of hidden layers (300).
Number of neurons in each hidden layer (300).
Number of neurons in input layers (50).
Number of neurons in output layer (1-12).
Dataset matrix.
Mathematical matrix of dataset
Size = 781 X 13000 X 50 =507.650.000
Essay matrix
Size = 781 X 50 = 39.050

42 of 53

Re Implementation of Kaggle AGE Model using Collab platform

Kaggle system was log in and the code was downloaded and then uploaded to the Collab platform, the dataset file then was downloaded from Cagle and uploaded to Google Drive. Google Drive and Collab were connected using a link, then run. The result was 73%. The code parameters were edited in the third place. And another algorithm was used with the code. It was run again with the result 96%.

43 of 53

Results Of Using the LSTM Algorithm

LSTM Experiments	Batch size	Epochs	Kappa score	Time AVG.
Experiment 1	64	2	73.53%	6 m
Experiment 2	32	10	93.46%	10
Experiment 3	32	20	95.29%	17
Experiment 4	32	25	95.53%	21
Experiment 5	32	30	95.67%	27
Experiment 6	32	32	95.96%	24
Experiment 7	32	35	95.89%	30.2
Experiment 8	64	10	91.32%	11
Experiment 9	64	20	94.53%	11.5
Experiment 10	64	25	95.04%	12
Experiment 11	64	30	95.23%	14.2
Experiment 12	64	32	95.39%	16.8
Experiment 13	64	35	95.80%	27.4

44 of 53

Steps Of Lstm

45 of 53

Modified Algorithm

46 of 53

Clarifying the Kappa with Experiences

47 of 53

Bidirectional LSTM Algorithm Results

48 of 53

Steps Of Bidirectional-LSTM

49 of 53

Loss in Bidirectional LSTM algorithm

50 of 53

Kappa Ratio with each Bidirectional LSTM experiment

51 of 53

Comparing Results For Different Jerseys

Model	Batch size	Epochs	Kappa score	Time
base LSTM model	64	2	73.53%	6 min
modified LSTM	32	32	95.96%	24 min
Bidirectional LSTM	32	32	96.35%	51 min

52 of 53

12- References

Jyoti G. Borade and Laxman D. Netak, “Automated Grading of Essays: A Review” ,2021, SPRINGER.
Rahul K Chauhan, “Automated Content Grading Using Machine Learning” , 2020, ResearchGate.
Birpal Singh J. Kapoor, Shubham M. Nagpure, Sushil S. Kolhatkar, Prajwal G. Chanore, Mohan M. Vishwakarma, “An Analysis of Automated Answer Evaluation Systems based on Machine Learning”,2020, IEEE.
Mohamed Abdellatif Hussein, Hesham Hassan and Mohammad Nassef, “Automated language essay scoring systems: a literature review”, 2019,peerj.
V. V.Ramalingam , Apandian, Prateek Chetry and Himanshu Nigam, “Automated Essay Grading using Machine Learning Algorithm”, 2018, IOP.
Zining Wang1, Jianli Liu1, Ruihai Dong2, “ Intelligent Auto-grading System” ,2018, Proceedings.

1 of 53

2 of 53

3 of 53

4 of 53

5 of 53

6 of 53

7 of 53

8 of 53

9 of 53

10 of 53

11 of 53

12 of 53

13 of 53

14 of 53

15 of 53

16 of 53

17 of 53

18 of 53

19 of 53

20 of 53

21 of 53

22 of 53

23 of 53

24 of 53

25 of 53

26 of 53

27 of 53

28 of 53

29 of 53

30 of 53

31 of 53

32 of 53

33 of 53

34 of 53

35 of 53

36 of 53

37 of 53

38 of 53

39 of 53

40 of 53

41 of 53

42 of 53

43 of 53

44 of 53

45 of 53

46 of 53

47 of 53

48 of 53

49 of 53

50 of 53

51 of 53

52 of 53

53 of 53