1 of 9

Regression in Machine Learning

Anup Kumar

Freiburg Galaxy Team

GCC 2021 Training

June 28 - July 2, 2021

Freiburg, Germany

2 of 9

Regression

Supervised learning
Real valued targets
Cost/error/loss functions
Algorithms

Linear models
Support vectors
K nearest neighbours
Tree and Ensemble

Used for

Feature1	Feature2	...	FeatureN	Target
0.4	23.4	...	7.6	12
0.9	21	...	5.6	5.6

0.5	25	...	6.7	??

Known features and target

Known features and unknown target

3 of 9

Cost function

Mathematical functions
Error = <<True - Predicted>>
Examples

Mean squared error
Mean absolute error
Coefficient of determination
...

Feature1	Feature2	...	FeatureN	True target
0.5	10	...	6.7	9.0

Feature1	Feature2	...	FeatureN	Predicted target
0.25	21.3	...	3.7	3.4

Known features and target

Known features and predicted target

4 of 9

Algorithms: Linear models

Learn weight/coefficient for each feature
y (predicted target) = w0 + w1 x Feature 1 + w2 x Feature 2 + .. + wN x Feature N
w (weights) = [w0, w1, w2, …, wN]
X (input features) = [Feature1, Feature2, …, FeatureN]
Examples

Linear regression
Ridge regression
ElasticNet
...

Different variants of the minimisation equation
Advantage: Simple and fast
Disadvantage: Problems in learning non-linear relations

Scikit-learn

5 of 9

Support vector machines

Linear and non-linear variants
Support vectors
Advantages

High-dimensional data
Number of samples << number of dimensions
Memory efficient - uses only support vectors

Disadvantages

Large runtime
Scale invariant

Examples: SVR, NuSVR, LinearSVR

Shan Yu et. al.

6 of 9

K Nearest Neighbours

Prediction based on the nearest neighbours
Examples

K Nearest neighbours

Based on k neighbours

Radius neighbours

Neighbours within r radius

Advantages

Simple to understand
Non-parametric

Disadvantages

Runtime increases with data
High memory requirements
Insensitive to outliers

Scikit-learn

7 of 9

Decision tree

https://gdcoder.com/decision-tree-regressor-explained-in-depth/

8 of 9

Decision tree

Decision rules and paths
Advantages

Easy to interpret
Logarithmic cost for prediction

Disadvantages

Sensitive to variations in data
Prone to overfit
Need to balance dataset

9 of 9

Ensemble models

Combination of multiple trees
Bagging

Build independent multiple trees
Average prediction
Examples

Random Forest
Extremely Randomised Trees

Boosting

Improve tree models sequentially
Combine weak models to robust ensemble
Examples

AdaBoost
GradientBoosting

Christoph A. Keller et. al.