�
Data Mining_Anoop Chaturvedi
1
Swayam Prabha
Course Title
Multivariate Data Mining- Methods and Applications
Lecture 38
SVM for Linearly Non-Separable Cases and SVM Regression
By
Anoop Chaturvedi
Department of Statistics, University of Allahabad
Prayagraj (India)
Slides can be downloaded from https://sites.google.com/view/anoopchaturvedi/swayam-prabha
Example: Iris Dataset⇒ Observations on sepal and petal lengths and width of 150 flowers, 50 each of Iris Sentosa, Iris Versicolor, and Iris Virginica.
Objective ⇒ Classifying Iris Species Using SVM
R packages used to train the model ⇒ GGplot2, tidyverse, e1071, tune
We start with a linear kernel.
Sampling method ⇒ 10-fold cost validation.
Data Mining_Anoop Chaturvedi
2
Data Mining_Anoop Chaturvedi
3
Data Mining_Anoop Chaturvedi
4
Dataset ⇒ 100 train data + 50 test data
Use training set to create SVM model
Data Mining_Anoop Chaturvedi
5
SVM is able to correctly identify 96 of the 100 observations
Best Parameters tunning
Data Mining_Anoop Chaturvedi
6
Data Mining_Anoop Chaturvedi
7
For the test set, best SVM model is able to accurately predict 49/50, i.e., with 98% accuracy.
Using 10-fold CV we achieve a model with 96% accuracy.
Data Mining_Anoop Chaturvedi
8
Linearly Non-separable Case
Some data from one class may infiltrate to region of space belonging to the other class.
Because of such overlaps some of the overlapping points may be misclassified.
Non-separable case occurs if
(i) two classes are nonlinearly separable, or
(ii) no clear separability exists between the two classes.
High noise levels (large variances) of one or both classes may cause overlapping classes.
Data Mining_Anoop Chaturvedi
9
Data Mining_Anoop Chaturvedi
10
Data Mining_Anoop Chaturvedi
11
Data Mining_Anoop Chaturvedi
12
Data Mining_Anoop Chaturvedi
13
Data Mining_Anoop Chaturvedi
14
Data Mining_Anoop Chaturvedi
15
Data Mining_Anoop Chaturvedi
16
Data Mining_Anoop Chaturvedi
17
Data Mining_Anoop Chaturvedi
18
Data Mining_Anoop Chaturvedi
19
Data Mining_Anoop Chaturvedi
20
Data Mining_Anoop Chaturvedi
21
-c c
Huber error measure
Data Mining_Anoop Chaturvedi
22
Data Mining_Anoop Chaturvedi
23
Data Mining_Anoop Chaturvedi
24
Data Mining_Anoop Chaturvedi
25
Example: Generated data on two variates and classified using kernel SVM.
Data Mining_Anoop Chaturvedi
26
Data Mining_Anoop Chaturvedi
27
Data Mining_Anoop Chaturvedi
28
Parameters tunning
Data Mining_Anoop Chaturvedi
29
From figure we observe that number of training errors is large. Increasing the value of cost reduces the number of training errors. Then, it leads to more irregular decision boundary with a risk of overfitting the data.
Data Mining_Anoop Chaturvedi
30