Stochastic Gradient Descent
Prof. Seungchul Lee
Industrial AI Lab.
Gradient Descent
2
Batch Gradient Descent (= Gradient Descent)
3
Batch Gradient Descent
4
Stochastic Gradient Descent (SGD)
5
SGD is Sometimes Better
6
Mini-batch Gradient Descent
7
Implementation with TensorFlow
8
Batch Gradient Descent with TensorFlow
9
Batch Gradient Descent with TensorFlow
10
Stochastic Gradient Descent (SGD) with TensorFlow
11
Mini-batch Gradient Descent with TensorFlow
12
Limitation of the Gradient Descent
13
Setting the Learning Rate
14
SGD Learning Rate (= Step Size)
15
Source: Dr. Francois Fleuret at EPFL
SGD Learning Rate: Spatial
16
Harder !
Nice (all features are equally important)
Source: Dr. Francois Fleuret at EPFL
SGD Learning Rate: Spatial
17
Source: Dr. Francois Fleuret at EPFL
SGD Learning Rate: Spatial
18
Source: Dr. Francois Fleuret at EPFL
SGD Learning Rate: Spatial
19
Source: Dr. Francois Fleuret at EPFL
SGD Learning Rate: Spatial
20
Source: Dr. Francois Fleuret at EPFL
SGD Learning Rate: Spatial
21
Source: Dr. Francois Fleuret at EPFL
SGD Learning Rate: Spatial
22
Source: Dr. Francois Fleuret at EPFL
SGD Learning Rate: Temporal
23
Adaptive Gradient Learning Rate Methods
24
Adaptive Gradient Learning Rate Methods
25
Adaptive Learning Rate Methods
26
Source: 6.S191 Intro. to Deep Learning at MIT