1 of 11

FOR CS STUDENTS

Support Vector

Machines

Explained Like You're a Developer — Not a Mathematician

Binary Classification

Kernel Trick

sklearn

Real Code

svm_intro.py

from

sklearn.svm

import

SVC

# ← That's literally all you need!

model = SVC(kernel='rbf', C=1.0) → done.

2 of 11

What Does SVM Actually Do?

🧠 The Intuition

📧

Spam or Not Spam?

Imagine sorting 1000 emails. SVM draws the best possible line between spam and legit mail.

🔴

Two Classes of Data

You have red dots and blue dots. SVM finds the line that puts max gap between both groups.

📏

Biggest Gap = Best Line

SVM doesn't just find any dividing line — it finds the one with the WIDEST "street" between classes.

📌

Only Edge Points Matter

Only the dots touching the edge of that street matter. Those are called Support Vectors.

Visual Example

← Margin →

Decision

Boundary

Support

Vectors

Class A

Class B

3 of 11

4 Terms You Must Know

Hyperplane

💬 The 'dividing line' (or wall in 3D)

# In 2D → a line

# In 3D → a flat plane

# In N-D → a hyperplane

Margin

💬 Width of the empty 'street' between classes

# SVM goal: make this street

# as WIDE as possible

# wider = better generalization

Support Vectors

💬 The data points sitting on the edge of the street

print(model.support_vectors_)

# Only THESE points define

# the boundary — others ignored!

Kernel

💬 Magic trick to handle non-linear data

# Can't draw a straight line?

# Kernel maps data to higher

# dimension where you CAN!

4 of 11

How SVM Works — Step by Step

Take your training data

You have labeled data: emails marked spam/not-spam, images marked cat/dog, etc.

X = [[features...], ...] # your data

y = [1, -1, 1, -1, ...] # labels

SVM finds the best boundary

It tries ALL possible lines/planes and picks the one with maximum gap between classes.

model = SVC(kernel='linear')

model.fit(X_train, y_train)

Use kernel for curved boundaries

If a straight line can't separate data, kernel maps it to higher dimensions where it can.

model = SVC(kernel='rbf') # Radial Basis

# Now handles circular/complex shapes!

Predict new data

New point comes in → check which side of the boundary it lands on → done!

model.predict([[new_data]]) # returns 1 or -1

model.decision_function([[x]]) # distance score

5 of 11

The C Parameter — Overfitting vs Underfitting

C = how much you penalize misclassifications

Think of it like a strictness dial: Low C = relaxed teacher | High C = very strict teacher

C = 0.01 (Small)

Underfitting Risk

Wide margin allowed

Some misclassifications OK

Simpler, smoother boundary

May miss important patterns

SVC(C=0.01) # Too forgiving

C = 1.0 (Balanced) ✓

Usually Best Start

Default in sklearn

Good balance of both

Start here, then tune

Works for most problems

SVC(C=1.0) # Default, good start

C = 100 (Large)

Overfitting Risk

Very narrow margin

Tries to classify all points

Complex boundary

May fail on new data

SVC(C=100) # Too strict

6 of 11

Kernels — The Magic Trick

Problem: What if your data can't be separated with a straight line?

Solution: Kernel maps data into higher dimensions where a straight line CAN separate it — without slowing down your code.

Linear

⚡ Fastest

✓ When data IS already separable with a line

📌 Text classification, high-dim data

SVC(kernel='linear')

Polynomial

🐢 Slower

✓ Curved but structured boundaries

📌 Image recognition, NLP

SVC(kernel='poly', degree=3)

RBF (Radial)

⚡ Good

✓ Don't know shape of boundary

📌 General purpose — most popular

SVC(kernel='rbf', gamma='scale')

Sigmoid

⚡ Medium

✓ Neural-net-like decision surface

📌 Specific niche problems

SVC(kernel='sigmoid')

7 of 11

Complete Python Workflow

svm_full_example.py

# ── 1. Imports ───────────────────────────────────────

from sklearn.svm import SVC

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split, GridSearchCV

from sklearn.metrics import accuracy_score, classification_report

# ── 2. Prepare Data ─────────────────────────────────

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

scaler = StandardScaler() # ALWAYS scale for SVM!

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# ── 3. Train ────────────────────────────────────────

model = SVC(kernel='rbf', C=1.0, gamma='scale')

model.fit(X_train, y_train)

# ── 4. Evaluate ─────────────────────────────────────

y_pred = model.predict(X_test)

print(f'Accuracy: {accuracy_score(y_test, y_pred):.2%}')

print(classification_report(y_test, y_pred))

# ── 5. Tune Hyperparams ─────────────────────────────

grid = GridSearchCV(SVC(), {'C':[0.1,1,10], 'kernel':['rbf','linear']}, cv=5)

grid.fit(X_train, y_train)

print('Best:', grid.best_params_) # → {'C':1, 'kernel':'rbf'}

8 of 11

SVR — When Your Target Is a Number

SVC vs SVR

SVC

SVR

Task

Classification (cat/dog)

Regression (predict price)

Output

Class label (+1 or −1)

Continuous number (42.7)

Loss fn

Hinge loss

ε-insensitive loss

Class

sklearn.svm.SVC

sklearn.svm.SVR

Key param

C, kernel, gamma

C, kernel, gamma, epsilon

SVR in Action

from sklearn.svm import SVR

from sklearn.metrics import r2_score, \

mean_squared_error as MSE

# Predict house prices

svr = SVR(

kernel = 'rbf',

C = 100, # strictness

gamma = 0.1, # RBF width

epsilon = 0.1 # tube width

)

svr.fit(X_train, y_train)

y_pred = svr.predict(X_test)

print(f'R² = {r2_score(y_test,y_pred):.3f}')

print(f'MSE = {MSE(y_test,y_pred):.2f}')

# epsilon: predictions within ±0.1

# of true value → zero penalty

9 of 11

Should You Use SVM?

✅ USE SVM WHEN

Dataset is small to medium (< 50k samples)

Data has many features (text, genomics)

You need a clear margin boundary

Classes are clearly separable

You need a non-linear classifier

Generalization matters most

❌ AVOID SVM WHEN

Dataset is very large (> 100k rows)

You need fast training/updates

Data has tons of noise

Classes heavily overlap

You need probability outputs

Interpretability is critical

🔄 ALTERNATIVES

Random Forest

Large noisy data

Logistic Reg.

Interpretability

XGBoost

Tabular, speed

Neural Net

Images, text, huge data

KNN

Simple quick baseline

Naive Bayes

Text, very fast

10 of 11

SVM Cheat Sheet

SVC / SVR Parameters

Parameter

Default

What it controls

kernel

'rbf'

Type of decision boundary

1.0

Strictness — higher = fewer mistakes, overfitting risk

gamma

'scale'

How far each point's influence reaches (RBF/poly)

degree

Polynomial degree (only for kernel='poly')

epsilon

0.1

SVR tube width — tolerance zone (SVR only)

class_weight

None

Set 'balanced' if classes are imbalanced

probability

False

True to enable predict_proba() output

Quick Recipes

Best starting point

SVC(kernel='rbf',

C=1.0,

gamma='scale')

Text classification

SVC(kernel='linear',

C=1.0)

Tune automatically

GridSearchCV(SVC(),

{'C':[0.1,1,10],

'gamma':['scale','auto']})

Regression

SVR(kernel='rbf',

C=100, epsilon=0.1)

11 of 11

Key Takeaways

SVM finds the line/boundary with the WIDEST gap between classes

Support Vectors = only the edge points matter. Rest are ignored.

C controls strictness. Start at 1.0, tune with GridSearchCV.

Kernel = trick to handle curved/complex data. Default: RBF.

ALWAYS scale your features with StandardScaler before SVM.

SVR = same idea but for predicting numbers, not categories.

$ python -c "from sklearn.svm import SVC; print('Ready to build!')"