Final Review - Part 1
Manana Hakobyan and Stephanie Djajadi
Overview of Topics
Agenda
PCA
[ True or False ] PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE
PCA
[ True or False ] PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE
PCA
The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA?
PCA
The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA?
PCA
What happens when you get features in lower dimensions using PCA?
PCA
What happens when you get features in lower dimensions using PCA?
PCA
Imagine, you are given the following scatterplot between height and weight.
Select the angle which will capture maximum variability along a single axis?
A. ~ 0 degree
B. ~ 45 degree
C. ~ 60 degree
D. ~ 90 degree
PCA
Imagine, you are given the following scatterplot between height and weight.
Select the angle which will capture maximum variability along a single axis?
A. ~ 0 degree
B. ~ 45 degree
C. ~ 60 degree
D. ~ 90 degree
PCA
Which of the following can be the first 2 principal components after applying PCA?
PCA
Which of the following can be the first 2 principal components after applying PCA?
PCA
Suppose X is a (100 x 5) matrix with rank 3.
What are dimensions of U, Ξ£, and V?
Remember X = UΞ£VT (or XV = UΞ£)
Probabilities, RVs (Fall 2017 Final)
Probabilities, RVs (Fall 2017 Final)
Probabilities, RVs (Fall 2017 Final)
Probabilities, RVs (Fall 2017 Final)
Loss Functions
Loss Functions
Loss Functions
Remember: To minimize the Loss Function you need to take the derivative (gradient) and set it equal to 0 !
Pandas & SQL (Spring 2019 Final Q7c)
Pandas & SQL (Spring 2019 Final Q7c) - Solution
Logistic Regression
Regression vs Classification
Examples:
Sigmoid Function
1 |
1 + e-x |
ex |
1 + ex |
dΟ(x) |
dx |
Cross Entropy Loss
Practice - Logistic Regression (T/F)
Practice - Spring 2019 Midterm 2 Q2a
Calculate Empirical Risk and estimate π°^.
Classifier Evaluation
Classifiers
Evaluation
| 1 | 0 |
1 | True positive (TP) | False positive (FP) |
0 | False negative (FN) | True negative (TN) |
Prediction
Truth
ROC Curves
True positive rate
False positive rate
Always predicting 0
Always predicting 1
Practice - Classification
You have 2 classifiers A and B, and you are trying to pick one to use for filtering spam. You train both on a dataset of 100 spam and 100 ham emails.
Classifier A has 0% accuracy on the dataset, and Classifier B has 50% accuracy. Which classifier would you rather use?
A) Classifier A
B) Classifier B
Practice - Precision & Recall
A classifier has a high number of false negatives, and we want to reduce this number. Which metric should we study to address this?
A) Accuracy
B) Precision
C) Recall
Practice - Precision & Recall
Suppose you create a classifier to predict whether an image contains a picture of a goat. You test it on 23 images.
Determine the precision and recall of your goat classifier.
Solutions - Precision & Recall
Suppose you create a classifier to predict whether an image contains a picture of a goat. You test it on 23 images.
Precision: TP/(TP+FP) = 9/(9+3) = 3/4
Recall:TP/(TP+FN) = 9/(9+2) = 9/11
Decision Trees
Entropy; Loss of a Split; Information Gain
From the lecture we calculate the entropy of the node:
We calculate the loss of a split (split entropy):
Information Gain: S(Node) - entropy of split
Practice with Entropy
First node:
Second node:
Loss:
Information Gain:
40 D, 60 B
20 D, 10 B
20 D, 50 B
Practice with Entropy
First node: S(N1)= -(β )log(β ) - (β )log(β ) = 0.918
Second node: S(N2) = -(2/7)log(2/7) - (5/7)log(5/7)=0.863
Loss: (30 * 0.918 + 70 * 0.863)/100 = 0.8795
Information Gain: S(Node) - Loss Split = 0.97 - 0.8795 = 0.0905
40 D, 60 B
20 D, 10 B
20 D, 50 B
S(Node) = -(2/5)log(2/5) - (β )log(β )=0.97
Problems with Decision Trees
Random Forests
Random Forests
Decision Trees and Random Forest
In Random forest you can generate hundreds of trees (say T1, T2 β¦..Tn) and then aggregate the results of these tree. Which of the following is true about individual(Tk) tree in Random Forest?
Decision Trees and Random Forest
In Random forest you can generate hundreds of trees (say T1, T2 β¦..Tn) and then aggregate the results of these tree. Which of the following is true about individual(Tk) tree in Random Forest?
Decision Trees and Random Forest
How to select best hyperparameters in tree based models?
A) Measure performance over training data
B) Measure performance over validation data
C) Both of these
D) None of these
Decision Trees and Random Forest
How to select best hyperparameters in tree based models?
A) Measure performance over training data
B) Measure performance over validation data
C) Both of these
D) None of these