1 of 12

MACHINE LEARNING

Training, Test and Validation sets

Dr. G.N.V.G. Sirisha

Dr. Ch. Someswara Rao

Sri. R. Shiva Shankar

Sri. V.V. Durga Kiran

Department of Computer Science and Engineering

Sagi Rama Krishnam Raju Engineering College

Bhimavaram, AndhraPradesh-534202

2 of 12

Training, Testing, and Validation Sets

Training Dataset

The sample of data used to fit the model.

Validation Dataset

Provide an unbiased evaluation of a model fit on the training dataset

Test Dataset

Provide an unbiased evaluation of a final model fit on the training dataset.

SRKR Engineering College, Department of CSE

7/1/2020

Sri. R. SHIVA SHANKAR, Assistant Professor

3 of 12

SRKR Engineering College, Department of CSE

7/1/2020

The training set to actually train the algorithm,

The validation set to keep track

To avoid Overfitting.
tuning the parameters of a model.

The test set to produce the final results.

Sri. R. SHIVA SHANKAR, Assistant Professor

4 of 12

Confusion Matrix

SRKR Engineering College, Department of CSE

7/1/2020

Used to describe the performance of a classification model

Summary of prediction results on a classification problem.

Predictions are summarized with count values and broken down by each class.

Sri. R. SHIVA SHANKAR, Assistant Professor

5 of 12

SRKR Engineering College, Department of CSE

7/1/2020

Positive (P) : Observation is positive (for example: is an apple).
Negative (N) : Observation is not positive (for example: is not an apple).
True Positive (TP) : Observation is positive, and is predicted to be positive.
False Negative (FN) : Observation is positive, but is predicted negative.
True Negative (TN) : Observation is negative, and is predicted to be negative.
False Positive (FP) : Observation is negative, but is predicted positive.

Sri. R. SHIVA SHANKAR, Assistant Professor

Here,

Class 1 : Positive

Class 2 : Negative

6 of 12

The Accuracy Metrics

SRKR Engineering College, Department of CSE

7/1/2020

Accuracy

Classification Rate or Accuracy is given by the relation

Sri. R. SHIVA SHANKAR, Assistant Professor

The problem with accuracy is that it doesn’t tell us everything about the results

interpret the performance of a classifier, namely sensitivity and specificity, and precision and recall.

7 of 12

Sensitivity & Specificity

SRKR Engineering College, Department of CSE

7/1/2020

Sensitivity (also known as the true positive rate or recall)

It is the ratio of the number of correct positive examples to the number classified as positive
specificity is the same ratio for negative examples.

Sensitivity=TP/TP+FN
Specificity=TN/TN+FP

The model is to detecting events in the positive class.

Specificity (also known as true negative rate ) measures how exact the assignment to the positive class is.

Sri. R. SHIVA SHANKAR, Assistant Professor

8 of 12

SRKR Engineering College, Department of CSE

7/1/2020

The ratio of the total number of correctly classified positive examples divide to the total number of positive examples.

How good the model is in detecting positive events.

High Recall indicates the class is correctly

Sri. R. SHIVA SHANKAR, Assistant Professor

9 of 12

Divide the total number of correctly classified positive examples by the total number of predicted positive examples.

It indicates an example labelled as positive is indeed positive

How good the model is at assigning positive events to the positive class.

SRKR Engineering College, Department of CSE

7/1/2020

Sri. R. SHIVA SHANKAR, Assistant Professor

10 of 12

The model from two perspectives, also called

type I error as measured by recall
type II error as measured by precision

High recall, low precision

positive examples are correctly recognized (low FN) but there are a lot of false positives.

Low recall, high precision

positive examples (high FN) but those we predict as positive are indeed positive (low FP)

SRKR Engineering College, Department of CSE

7/1/2020

Sri. R. SHIVA SHANKAR, Assistant Professor

11 of 12

Calculate an F-measure which uses Harmonic Mean in place of Arithmetic Mean.

The F-Measure will always be nearer to the smaller value of Precision or Recall.

SRKR Engineering College, Department of CSE

7/1/2020

Sri. R. SHIVA SHANKAR, Assistant Professor

12 of 12

**THE –END**

SRKR Engineering College, Department of CSE

7/1/2020

Sri. R. SHIVA SHANKAR, Assistant Professor