Fundamentals of Deep Neural Networks
1
First Things to Note
2
INPUT
F()
Output / Target
ML: Design Pattern
3
Problem statement:
Define your model:
Define the objective function:
Optimize the objective:
Now you have all for your model:
Let us Start with a Problem
4
Movie Recommendation
5
Know Your Data
6
Sample/example
Features /
input variables
Output / target
Decision Boundary
7
Linear Decision Boundary
8
Class 0
Class 1
Non-linear boundary
9
Optimization: Gradient Descent
10
Review of Differential Calculus
11
What is the sign of here?
What is the sign of here?
Local and Global minima
12
x
F(x)
Gradient Descent
13
Input | Output |
| |
| |
| |
| |
Training data
Gradient Descent
14
we want to reach here
Gradient Descent on Multivariate Function
15
Gradient Descent Algorithm
16
Limitations of Gradient Descent
17
18
Small value of r
slow convergence
Large value of r
Oscillation / overshooting
Adaptive Gradient Descent
19
Adagrad
20
Adagrad
21
Adagrad
22
Adaptive Moment Estimation�(ADAM)
23
Deep Neural Network
24
Motivation
25
A Motivating Problem
26
Movie Recommendation
27
You liked
You disliked
New movie
Observation: the classes can be separated by straight line
Movie Recommendation
28
Logistic Regression: Graphical View
29
f(z)
output
Non-Linear Decision Boundary
Limitations of Linear Decision Function
30
The decision function involves two straight lines
Thus, we need to combine information from two straight lines
Addressing Non-linearity
31
Here is a good network
f
f
f
output
Moral of the story:
Hidden nodes help make decision in complex cases
In ML, the complex case means highly non-linear decision boundary that separates the classes
More hidden layers you take, more complex decision your model can make
Structure of Deep Network
32
Structure of Deep Network
33
output
Components of Deep network
Ingredients of a Deep Network
34
Ingredients of a Deep Network
35
Things you need to have in order to describe a deep network:
Ingredients of a Deep Network
36
What does a non-input neuron of the network do?
Each node takes inputs from nodes from previous layer,
linearly combines them,
and then passes through the activation function
Activation Function
37
Deep vs. Shallow Network
38
What is the # features of the data the deep network trying to model?
How many class labels are there in that data?
How many parameters the deep net has?
Training Deep Network
39
Backpropagation Algorithm���The Pillar of Deep Learning
40
Training Neural Network
41
x
1
2
3
Training Neural Network
42
x
1
2
3
Training Neural Network
43
x
1
2
3
Training Neural Network
44
x
1
2
3
Let us compute gradients:
The chain rule of derivative
Training Neural Network: Computing Gradients
45
x
1
2
3
gradients: the common parts
This method is known as backpropagation
46
47
Other Important Details
Parameter Initialization
48
Initialization
49
Initialization
50
Computing Loss
51
Cross Entropy
52
Regularization
53
Improving Single Model Performance
54
Regularization:
55
Common Regularization Technique:
56
Regularization: L1 and L2 Regularization (Weight Decay)
57
Regularization: L1 and L2 Regularization (Weight Decay)
Regularization: Dropout
58
Regularization: Dropout
59
Key Idea: During training, randomly drop some neurons. Probability of dropping is a hyper-parameter.
Srivastava et. al.
Regularization: Early Stopping
60
Key Idea:
Early stopping involves monitoring the model's performance on a validation set during training and stopping training when the performance stops improving, which prevents overfitting by not allowing the model to train too long on the training data.
Regularization: Batch Normalization
61
Key Idea: Normalizes the inputs of each layer to have zero mean and unit variance.
It helps stabilize and accelerate training by reducing internal covariate shift.
Regularization: Data Augmentation
62
Source: Fei Fei Li
Data Augmentation: Image Transformation
63
Source: Fei Fei Li
Data Augmentation: Random Crops and Scales
64
Source: Fei Fei Li
Data Augmentation: Color Changes
65
Source: Fei Fei Li
Thank you!
66