Machine Learning�for English Analysis
Prof. Seungtaek Choi
Recap
Recap: Neural Networks
Multiple Perceptrons
6
First layer
Second layer
Feature Learning
Looks a lot like logistic regression
The only difference is, instead of input a feature vector, the features are just values calculated by the hidden layer
Multi Layer Perceptron (MLP) = Artificial Neural Networks
8
…
Linear classification
Feature Learning
Nonlinear mappings
Linearly separable
Recap: Backpropagation
Gradients in ANN
10
Training Neural Networks: Backpropagation Learning
11
Backpropagation
12
These are what we need for GD
Recap: �Activation Function
Today
Supplementary for Assignment2
Google Colab
Google Colab
Google Colab
Ctrl + Enter to run the cell
or just click the button
Google Colab
Now it’s time to
practice with lecture�materials
Google Colab
Select GitHub option
to bring our material �from GitHub
Google Colab
Input link of the provided material�(.ipynb file)
Google Colab
Then click this!
Google Colab
On the right side �of your screen
Google Colab
You can use �free-tier GPU
for practice
Google Colab
Run all the cells
Google Colab
Cell types: Code and Text (markdown)
Google Colab
You can add modules
Google Colab
After adding this path
You can import the module
Optimizer
Gradient Descent
Mini-Batches
Gradient Descent
Mini-batches while training
Regularization
Lowering the capacity of the model
- discouraging the ability for the model to learn a singular pathway
- forcing the model to learn these multiple pathways to make a single decision.
Evaluation: Protocol
Evaluation Protocol
Or, It’s about model selection strategy.
Typical Leakage #1: Tuning on the test set
Typical Leakage #2: Fit Scalers/PCA on all data
Typical Leakage #3: Augment/Oversample on val/test
If there is no validation split?
Reproducibility vs. Replicability
Dropout should be turned off at test time.
Where Randomness Bites
In Our Assignment (not mandatory)
In Our Assignment (mandatory)
Evaluation: Metric
Accuracy
Limitation of Accuracy?
Confusion Matrix (binary)
Precision & Recall
Why Important? (beyond Accuracy)
Example 1 – Imbalanced Digits
Example 2 – Cancer Screening
3-Class Example
| Pred A | Pred B | Pred C | (row sum) |
True A | 24 | 4 | 2 | 30 |
True B | 6 | 10 | 4 | 20 |
True C | 5 | 5 | 40 | 50 |
(col sum) | 35 | 19 | 46 | |
Thresholds & Curves (quick note)
One-Hot Vector
What is a One-Hot Vector?
Why One-Hot?
Classification with Softmax + CrossEntropy
From One-Hot to Embedding Lookup
Model Ensemble
What are Ensembles?
Why Ensembles?
Why Ensembles?
Where Does the Gain Come From?
How to Represent Text
Step 1. Word Vectors
Suggested Readings
Next