1
Core Concepts in Machine Learning 1
Nikhil Bhagwat
The Neuro, McGill University, Montreal, QC, Canada
nikhil.bhagwat@mcgill.ca
ABCD-ReproNim
ABCD-ReproNim: An ABCD Course on Reproducible Data Analyses
2
Learning Objectives of this Lecture
ABCD-ReproNim
ABCD-ReproNim: An ABCD Course on Reproducible Data Analyses
ABCD-ReproNim
3
Pop Quiz
Say, currently we have a population with 1% covid prevalence. We train a simple machine-learning model to identify COVID patients using their biometry.
Our model is 91% accurate! Then we also calculate,
4
Pop Quiz
Say, currently we have a population with 1% covid prevalence. We train a simple machine-learning model to identify COVID patients using their biometry.
Our model is 91% accurate! Then we also calculate,
What are my chances that I have COVID, if my test is positive?
5
Pop Quiz
Say, currently we have a population with 1% covid prevalence. We train a simple machine-learning model to identify COVID patients using their biometry.
Our model is 91% accurate! Then we also calculate,
What are my chances that I have COVID, if my test is positive?
Later we train a fancy deep learning model to identify COVID patients using their chest CT! This model has accuracy of 99%! We calculate
Which model is better?
6
Y
X
ML Model
Training a machine-learning model
7
Y
X
ML Model
This is the easy part!
Training a machine-learning model
8
Y
X
ML Model
This was the easy part!
Which features?
ML Model
ML Model
ML Model
Which model?
Which performance metrics?
9
Y
X
ML Model
This was the easy part!
Which features?
ML Model
ML Model
ML Model
Which model?
Which performance metrics?
How do I validate my design choices?
10
Machine-learning - what, why, and when?
11
Machine-learning - what, why, and when?
12
Machine-learning - what, why, and when?
13
Input Examples:
features (p)
samples
(n)
labels
Y
X
Model (M)
Output Examples: Clinical measures
Terminology
samples
(n)
14
Outcome | Supervised Learning | Unsupervised Learning |
Continuous | Regression | Dimensionality reduction |
Categorical | Classification | Clustering |
Y
X
Types of ML Algorithms
15
Outcome | Supervised Learning | Unsupervised Learning |
Continuous | Regression | Dimensionality reduction |
Categorical | Classification | Clustering |
Y
X
Types of ML Algorithms
16
Supervised Learning: Models
17
Linear Regression
Supervised Learning: Models
18
Linear Regression
SVM
Supervised Learning: Models
19
Linear Regression
SVM
Tree-ensembles
Supervised Learning: Models
20
Linear Regression
SVM
ANN
Tree-ensembles
Supervised Learning: Models
21
Model Fitting
MSE
β2
β1
MSE = ⎼ (yi - ŷi )2
1
n
Σ
i = 1
n
22
MSE
MSE
Model Fitting
ŷi = β0 + β1 xi
β0
β1
β0
β1
MSE = ⎼ (yi - ŷi )2
1
n
Σ
i = 1
n
23
Model Fitting
Local
Minimum
Global
Minimum
More complex models / loss functions (e.g. ANNs)
24
Model Fitting
25
Model Fitting
y = β0 + β1 x1 + β2x2 + … + βp-1 xp-1 + βpxp
26
Model Fitting
27
Model Fitting: Regularization
MSE = (yi - [ β0 + xij βj ] )2 + 𝝀 |βj|
Σ
i = 1
n
Σ
j = 1
p
Σ
j = 1
p
ŷi
L1
MSE = (yi - [ β0 + xij βj ] )2 + 𝝀 βj2
Σ
i = 1
n
Σ
j = 1
p
Σ
j = 1
p
ŷi
L2
Model Fitting: Scikit-learn syntax
# import
from sklearn import linear_model, svm
# data
X = [[0, 0], [1, 1]]
y = [0, 1]
# pick a model
model = linear_model.Lasso(alpha=0.1) # model = svm.SVC()
# fit the model with data
model.fit(X, y)
# predict on new data
y_pred = model.predict([[1, 0]])
29
Model Evaluation
Data
(N samples)
Test set
(~ 10% samples)
Train set
(~ 90% samples)
Trained model
Evaluate
Model Fitting
30
Model Evaluation
Data
(N samples)
Test set
(~ 10% samples)
Train set
(~ 90% samples)
Trained model
Evaluate
Model Fitting
31
Model Evaluation
Train set
Overfitting
Optimal
Underfitting
32
Model Evaluation
Train set
Test set
Overfitting
Optimal
Underfitting
33
Model Evaluation
X2
Underfitting
X1
X2
Optimal
X1
X2
Overfitting
X1
Train class_1
Train class_2
34
Model Evaluation
Test class_1
Overfitting
Optimal
Underfitting
X1
X2
X1
X2
X1
X2
Train class_1
Train class_2
Test class_2
35
Model Evaluation
Data
(N samples)
Test set
(~ 10% samples)
Train set
(~ 90% samples)
Trained model
Evaluate
Model Fitting
36
All data
Train data
Model Evaluation: Cross-Validation (Outer loop)
Test data
37
All data
Train data
Test data
Train data
Test data
Train data
Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
Train data
Test data
Train data
Train data
Test data
Train data
Test data
Train data
CV outer loop
Model Evaluation: Cross-Validation (Outer loop)
38
Model Evaluation
Data
(N samples)
Test set
(~ 10% samples)
Train set
(~ 90% samples)
Trained model
Evaluate
Model Fitting
39
Train data
All data
Split 1
val data
train
Fold 1
Fold 2
Fold 3
Fold 4
Test data
Split 2
val data
train
train
Split 3
train
train
val data
Split 4
val data
CV inner loop
Model Evaluation: Cross-Validation (Inner loop)
40
Model Evaluation: Hyper-parameters
41
Model Evaluation: Hyper-parameters
42
Model Evaluation: Hyper-parameters
43
Performance Scores
| | Ground Truth | |
| | POSITIVE | NEGATIVE |
Prediction | POSITIVE | | |
NEGATIVE | | | |
TP
TN
FP
FN
Confusion
Matrix
False Positive
False Negative
44
Performance Scores
45
Performance Scores
Score | Formula | Null | What does it tell us? | When do I use it? |
Accuracy | (TP+TN) / (TP+FP+FN+TN) | 0.99 | How many people did we correctly predict out of all the people scanned? | FNs & FPs have similar costs |
Precision (i.e. PPV) | TP/(TP+FP) | NaN | How many of those who we predicted as “covid” do actually have “covid”? | If you want to be more confident of your TPs |
Recall (aka Sensitivity) | TP/(TP+FN) | 0 | Of all the people who have covid, how many of those did we correctly predict? | If you prefer FPs over FNs. |
Specificity | TN/(TN+FP) | 1 | Of all the people who are healthy, how many of those did we correctly predict? | If you prefer FNs over FPs. |
F1 | 2*(Recall * Precision) / (Recall + Precision) | NaN | Harmonic mean(average) of the precision and recall. | When you have an uneven class distribution |
46
Pop Quiz Answers
We train a simple machine-learning model to identify COVID patients using their biometry, in a population with 1% covid prevalence. Our model is 91% accurate! Then we also calculate,
What are my chances that I have COVID if my test is positive?
(Imagine a sample of 1000 individuals → 10 COVID patients → 9 TP & 89 FP)
Later we train a fancy deep Learning model to identify COVID patients using their chest CT! This model has accuracy of 99%! We calculate
Which model is better? (We want to avoid FN to reduce the spread → we want high-sensitivity)
47
Performance Curves
48
Deep-learning
ANN for handwritten-digit images
(gif source: 3b1b)
49
Pitfalls and Challenges
50
Pitfalls and Challenges
51
ML Novice Checklist
52
ML Novice Checklist
53
Takeaways
It’s Covid
because…
Explainable AI
54
Core Concepts in Machine Learning 1
Nikhil Bhagwat
The Neuro, McGill University, Montreal, QC, Canada
nikhil.bhagwat@mcgill.ca
ABCD-ReproNim
ABCD-ReproNim: An ABCD Course on Reproducible Data Analyses