(Artificial) Neural Networks:�From Perceptron to MLP
Binary Linear Classifier
2
Binary Linear Classifier
3
Binary Linear Classifier with New Data
4
Binary Linear Classifier
5
Binary Linear Classifier in High Dimension
6
From Perceptron to MLP
7
XOR Problem
8
| | |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
Nonlinear Curve Approximated by Multiple Lines
9
XOR Problem
10
Artificial Neural Networks: MLP
11
Artificial Neural Networks: Activation Function
12
Artificial Neural Networks
13
Two Ways of Looking at Artificial Neural Networks
14
Common Activation Functions
15
Source: 6.S191 Intro. to Deep Learning at MIT
Discuss later
Artificial Neural Networks
16
Artificial Neural Networks
17
Artificial Neural Networks
18
Another Perspective:�ANN as Kernel Learning
19
Nonlinear Classification
20
https://www.youtube.com/watch?v=3liCbRZPrZA
Neuron
21
XOR Problem
22
Nonlinear Mapping
23
Source: Dr. Francois Fleuret at EPFL
Nonlinear Mapping
24
Source: Dr. Francois Fleuret at EPFL
Nonlinear Mapping
25
Source: Dr. Francois Fleuret at EPFL
Neuron
26
Kernel + Neuron
27
Neuron + Neuron
28
Multi Layer Perceptron
29
Summary
30
Deep Artificial Neural Networks
31
Feature learning
Classification
Class 1
Class 2
nonlinear
linear
…
Output
Input
Deep Artificial Neural Networks
32
Class 1
Class 2
…
…
…
…
…
…
nonlinear
linear
Feature learning
Classification
Output
Input
Machine Learning vs. Deep Learning
33
Deep Learning
34
Looking at Parameters
35
Logistic Regression in a Form of Neural Network
36
Logistic Regression in a Form of Neural Network
37
Do not indicate bias units
Nonlinearly Distributed Data
38
Nonlinearly Distributed Data
39
Do not include bias units
Multi Layers
40
Do not include bias units
Multi Layers
41
Do not include bias units
Multi Layers
42
Do not include bias units
Nonlinearly Distributed Data
43
Nonlinearly Distributed Data
44
Do not include bias units
Multi Layers
45
Do not include bias units
(Artificial) Neural Networks: �Training
46
Training Neural Networks: Optimization
47
Training Neural Networks: Loss Function
48
Training Neural Networks: Gradient Descent
49
Gradients in ANN
50
Dynamic Programming
51
Recursive Algorithm
52
…
Output
Input
…
Output
Input
Base Case
Dynamic Programming
53
Naïve Recursive Algorithm
54
Memorized Recursive Algorithm
55
Dynamic Programming Algorithm
56
Backpropagation
57
Gradients in ANN
58
Training Neural Networks: Backpropagation Learning
59
Backpropagation
60
Backpropagation
61
Backpropagation
62
Backpropagation
63
Training Neural Networks with TensorFlow
64
Core Foundation Review
65
Source: 6.S191 Intro. to Deep Learning at MIT
(Artificial) Neural Networks with TensorFlow
66
MNIST database
67
ANN in TensorFlow:�MNIST
68
Our Network Model
69
Input layer
(784)
hidden layer
(100)
output layer
(10)
Input image
(28 X 28)
flattened
digit prediction
in one-hot-encoding
Iterative Optimization
70
Implementation in Python
71
Input layer
(784)
hidden layer
(100)
output layer
(10)
Input image
(28 X 28)
flattened
Evaluation
72
(Artificial) Neural Networks: Advanced
73
Nonlinear Activation Function
74
The Vanishing Gradient Problem
75
Rectifiers
76
Rectifiers
77
Batch Normalization
78
Batch Normalization
79
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015.
Batch Normalization
80
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015.
Batch Normalization
81
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015.
Implementation of Batch Normalization
82
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015.
Dropout as Regularization
83
Regularization (Shrinkage Methods)
84
Different Regularization Techniques
85
Different Regularization Techniques
86
Training Steps
Error
Testing Error
Training Error
Early stopping
Different Regularization Techniques in Deep Learning
87
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research (JMLR), 15:1929-1958, 2014.
Dropout Illustration
88
Original model
Dropout Illustration
89
tf.nn.dropout(layer, rate = p)
Epoch 1
rate: the probability that each element is dropped. For example, setting rate = 0.1 would drop 10% of input elements
Dropout Illustration
90
tf.nn.dropout(layer, rate = p)
Epoch 1
rate: the probability that each element is dropped. For example, setting rate = 0.1 would drop 10% of input elements
Dropout
91
Implementation of Dropout
92