1 of 12

MACHINE LEARNING

Training, Test and Validation sets

Dr. G.N.V.G. Sirisha

Dr. Ch. Someswara Rao

Sri. R. Shiva Shankar

Sri. V.V. Durga Kiran

Department of Computer Science and Engineering

Sagi Rama Krishnam Raju Engineering College

Bhimavaram, AndhraPradesh-534202

2 of 12

Training, Testing, and Validation Sets

  • Training Dataset
    • The sample of data used to fit the model.

  • Validation Dataset
    • Provide an unbiased evaluation of a model fit on the training dataset

  • Test Dataset
    • Provide an unbiased evaluation of a final model fit on the training dataset.

SRKR Engineering College, Department of CSE

7/1/2020

2

Sri. R. SHIVA SHANKAR, Assistant Professor

3 of 12

SRKR Engineering College, Department of CSE

7/1/2020

3

  • The training set to actually train the algorithm,

  • The validation set to keep track
    • To avoid Overfitting.
    • tuning the parameters of a model.

  • The test set to produce the final results.

Sri. R. SHIVA SHANKAR, Assistant Professor

4 of 12

Confusion Matrix

SRKR Engineering College, Department of CSE

7/1/2020

4

  • Used to describe the performance of a classification model

  • Summary of prediction results on a classification problem.

  • Predictions are summarized with count values and broken down by each class.

Sri. R. SHIVA SHANKAR, Assistant Professor

5 of 12

SRKR Engineering College, Department of CSE

7/1/2020

5

  • Positive (P) : Observation is positive (for example: is an apple).
  • Negative (N) : Observation is not positive (for example: is not an apple).
  • True Positive (TP) : Observation is positive, and is predicted to be positive.
  • False Negative (FN) : Observation is positive, but is predicted negative.
  • True Negative (TN) : Observation is negative, and is predicted to be negative.
  • False Positive (FP) : Observation is negative, but is predicted positive.

Sri. R. SHIVA SHANKAR, Assistant Professor

Here,

Class 1 : Positive

Class 2 : Negative

6 of 12

The Accuracy Metrics

SRKR Engineering College, Department of CSE

7/1/2020

6

  • Accuracy

Classification Rate or Accuracy is given by the relation

Sri. R. SHIVA SHANKAR, Assistant Professor

  • The problem with accuracy is that it doesn’t tell us everything about the results

  • interpret the performance of a classifier, namely sensitivity and specificity, and precision and recall.

7 of 12

Sensitivity & Specificity

SRKR Engineering College, Department of CSE

7/1/2020

7

  • Sensitivity (also known as the true positive rate or recall)

    • It is the ratio of the number of correct positive examples to the number classified as positive
    • specificity is the same ratio for negative examples.

      • Sensitivity=TP/TP+FN
      • Specificity=TN/TN+FP

    • The model is to detecting events in the positive class. 

  • Specificity (also known as true negative rate ) measures how exact the assignment to the positive class is.

Sri. R. SHIVA SHANKAR, Assistant Professor

8 of 12

SRKR Engineering College, Department of CSE

7/1/2020

8

  • The ratio of the total number of correctly classified positive examples divide to the total number of positive examples.

  • How good the model is in detecting positive events.

  • High Recall indicates the class is correctly

Sri. R. SHIVA SHANKAR, Assistant Professor

9 of 12

 

  • Divide the total number of correctly classified positive examples by the total number of predicted positive examples.

  • It indicates an example labelled as positive is indeed positive

  • How good the model is at assigning positive events to the positive class.

SRKR Engineering College, Department of CSE

7/1/2020

9

Sri. R. SHIVA SHANKAR, Assistant Professor

10 of 12

  • The model from two perspectives, also called 

    • type I error as measured by recall
    • type II error as measured by precision

  • High recall, low precision

    •  positive examples are correctly recognized (low FN) but there are a lot of false positives.

  • Low recall, high precision

    • positive examples (high FN) but those we predict as positive are indeed positive (low FP)

SRKR Engineering College, Department of CSE

7/1/2020

10

Sri. R. SHIVA SHANKAR, Assistant Professor

11 of 12

 

  • Calculate an F-measure which uses Harmonic Mean in place of Arithmetic Mean.

  • The F-Measure will always be nearer to the smaller value of Precision or Recall.

SRKR Engineering College, Department of CSE

7/1/2020

11

Sri. R. SHIVA SHANKAR, Assistant Professor

12 of 12

**THE –END**

SRKR Engineering College, Department of CSE

7/1/2020

Sri. R. SHIVA SHANKAR, Assistant Professor

12