FOR CS STUDENTS
Support Vector
Machines
Explained Like You're a Developer — Not a Mathematician
Binary Classification
Kernel Trick
sklearn
Real Code
svm_intro.py
from
sklearn.svm
import
SVC
# ← That's literally all you need!
model = SVC(kernel='rbf', C=1.0) → done.
What Does SVM Actually Do?
🧠 The Intuition
📧
Spam or Not Spam?
Imagine sorting 1000 emails. SVM draws the best possible line between spam and legit mail.
🔴
Two Classes of Data
You have red dots and blue dots. SVM finds the line that puts max gap between both groups.
📏
Biggest Gap = Best Line
SVM doesn't just find any dividing line — it finds the one with the WIDEST "street" between classes.
📌
Only Edge Points Matter
Only the dots touching the edge of that street matter. Those are called Support Vectors.
Visual Example
← Margin →
Decision
Boundary
Support
Vectors
Class A
Class B
4 Terms You Must Know
01
Hyperplane
💬 The 'dividing line' (or wall in 3D)
# In 2D → a line
# In 3D → a flat plane
# In N-D → a hyperplane
02
Margin
💬 Width of the empty 'street' between classes
# SVM goal: make this street
# as WIDE as possible
# wider = better generalization
03
Support Vectors
💬 The data points sitting on the edge of the street
print(model.support_vectors_)
# Only THESE points define
# the boundary — others ignored!
04
Kernel
💬 Magic trick to handle non-linear data
# Can't draw a straight line?
# Kernel maps data to higher
# dimension where you CAN!
How SVM Works — Step by Step
1
Take your training data
You have labeled data: emails marked spam/not-spam, images marked cat/dog, etc.
X = [[features...], ...] # your data
y = [1, -1, 1, -1, ...] # labels
2
SVM finds the best boundary
It tries ALL possible lines/planes and picks the one with maximum gap between classes.
model = SVC(kernel='linear')
model.fit(X_train, y_train)
3
Use kernel for curved boundaries
If a straight line can't separate data, kernel maps it to higher dimensions where it can.
model = SVC(kernel='rbf') # Radial Basis
# Now handles circular/complex shapes!
4
Predict new data
New point comes in → check which side of the boundary it lands on → done!
model.predict([[new_data]]) # returns 1 or -1
model.decision_function([[x]]) # distance score
The C Parameter — Overfitting vs Underfitting
C = how much you penalize misclassifications
Think of it like a strictness dial: Low C = relaxed teacher | High C = very strict teacher
C = 0.01 (Small)
Underfitting Risk
Wide margin allowed
Some misclassifications OK
Simpler, smoother boundary
May miss important patterns
SVC(C=0.01) # Too forgiving
C = 1.0 (Balanced) ✓
Usually Best Start
Default in sklearn
Good balance of both
Start here, then tune
Works for most problems
SVC(C=1.0) # Default, good start
C = 100 (Large)
Overfitting Risk
Very narrow margin
Tries to classify all points
Complex boundary
May fail on new data
SVC(C=100) # Too strict
Kernels — The Magic Trick
Problem: What if your data can't be separated with a straight line?
Solution: Kernel maps data into higher dimensions where a straight line CAN separate it — without slowing down your code.
Linear
⚡ Fastest
✓ When data IS already separable with a line
📌 Text classification, high-dim data
SVC(kernel='linear')
Polynomial
🐢 Slower
✓ Curved but structured boundaries
📌 Image recognition, NLP
SVC(kernel='poly', degree=3)
RBF (Radial)
⚡ Good
✓ Don't know shape of boundary
📌 General purpose — most popular
SVC(kernel='rbf', gamma='scale')
Sigmoid
⚡ Medium
✓ Neural-net-like decision surface
📌 Specific niche problems
SVC(kernel='sigmoid')
Complete Python Workflow
svm_full_example.py
# ── 1. Imports ───────────────────────────────────────
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report
# ── 2. Prepare Data ─────────────────────────────────
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
scaler = StandardScaler() # ALWAYS scale for SVM!
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# ── 3. Train ────────────────────────────────────────
model = SVC(kernel='rbf', C=1.0, gamma='scale')
model.fit(X_train, y_train)
# ── 4. Evaluate ─────────────────────────────────────
y_pred = model.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2%}')
print(classification_report(y_test, y_pred))
# ── 5. Tune Hyperparams ─────────────────────────────
grid = GridSearchCV(SVC(), {'C':[0.1,1,10], 'kernel':['rbf','linear']}, cv=5)
grid.fit(X_train, y_train)
print('Best:', grid.best_params_) # → {'C':1, 'kernel':'rbf'}
SVR — When Your Target Is a Number
SVC vs SVR
SVC
SVR
Task
Classification (cat/dog)
Regression (predict price)
Output
Class label (+1 or −1)
Continuous number (42.7)
Loss fn
Hinge loss
ε-insensitive loss
Class
sklearn.svm.SVC
sklearn.svm.SVR
Key param
C, kernel, gamma
C, kernel, gamma, epsilon
SVR in Action
from sklearn.svm import SVR
from sklearn.metrics import r2_score, \
mean_squared_error as MSE
# Predict house prices
svr = SVR(
kernel = 'rbf',
C = 100, # strictness
gamma = 0.1, # RBF width
epsilon = 0.1 # tube width
)
svr.fit(X_train, y_train)
y_pred = svr.predict(X_test)
print(f'R² = {r2_score(y_test,y_pred):.3f}')
print(f'MSE = {MSE(y_test,y_pred):.2f}')
# epsilon: predictions within ±0.1
# of true value → zero penalty
Should You Use SVM?
✅ USE SVM WHEN
Dataset is small to medium (< 50k samples)
Data has many features (text, genomics)
You need a clear margin boundary
Classes are clearly separable
You need a non-linear classifier
Generalization matters most
❌ AVOID SVM WHEN
Dataset is very large (> 100k rows)
You need fast training/updates
Data has tons of noise
Classes heavily overlap
You need probability outputs
Interpretability is critical
🔄 ALTERNATIVES
Random Forest
Large noisy data
Logistic Reg.
Interpretability
XGBoost
Tabular, speed
Neural Net
Images, text, huge data
KNN
Simple quick baseline
Naive Bayes
Text, very fast
SVM Cheat Sheet
SVC / SVR Parameters
Parameter
Default
What it controls
kernel
'rbf'
Type of decision boundary
C
1.0
Strictness — higher = fewer mistakes, overfitting risk
gamma
'scale'
How far each point's influence reaches (RBF/poly)
degree
3
Polynomial degree (only for kernel='poly')
epsilon
0.1
SVR tube width — tolerance zone (SVR only)
class_weight
None
Set 'balanced' if classes are imbalanced
probability
False
True to enable predict_proba() output
Quick Recipes
Best starting point
SVC(kernel='rbf',
C=1.0,
gamma='scale')
Text classification
SVC(kernel='linear',
C=1.0)
Tune automatically
GridSearchCV(SVC(),
{'C':[0.1,1,10],
'gamma':['scale','auto']})
Regression
SVR(kernel='rbf',
C=100, epsilon=0.1)
Key Takeaways
1
SVM finds the line/boundary with the WIDEST gap between classes
2
Support Vectors = only the edge points matter. Rest are ignored.
3
C controls strictness. Start at 1.0, tune with GridSearchCV.
4
Kernel = trick to handle curved/complex data. Default: RBF.
5
ALWAYS scale your features with StandardScaler before SVM.
6
SVR = same idea but for predicting numbers, not categories.
$ python -c "from sklearn.svm import SVC; print('Ready to build!')"