1 of 19

Optimization for Deep Learning

Prof. Seungchul Lee

Industrial AI Lab.

2 of 19

Optimization

  • Optimization is a mathematical discipline that focuses on finding the best solution to a problem within a defined set of constraints

  • It involves maximizing or minimizing an objective function, which represents the goal of the optimization process, such as minimizing costs, maximizing efficiency, or achieving the best performance in a system

2

3 of 19

Optimization

  • 3 key components
    1. Objective function
    2. Decision variable or unknown
    3. Constraints

  • Procedures
    • Modeling: The process of identifying objective function, variables, and constraints for a given problem
    • Solving: Once the model has been formulated, optimization algorithm can be used to find its solutions

3

4 of 19

Optimization: Mathematical Expression

  •  

4

5 of 19

Optimization: Mathematical Expression

  • In mathematical expression

  • Remarks: equivalent�

5

6 of 19

Solving Optimization Problems

6

7 of 19

Solving Optimization Problems

  •  

7

8 of 19

Solving Optimization Problems

  •  

8

9 of 19

 

  •  

9

10 of 19

 

  •  

10

11 of 19

Descent Direction (1D)

  • It motivates the gradient descent algorithm, which repeatedly takes steps in the direction of the negative gradient

11

Positive: shift to the left

Negative: shift to the right

12 of 19

Gradient Descent

12

13 of 19

Stopping Criteria

  • when the gradient is sufficiently small

  • if the change in function value between iterations is very small

  • if the update step is very small

  • when the number of iterations hit the hard limit

13

14 of 19

 

  • Learning rate

14

Too small: converge very slow

Too big: overshoot and even diverge

Reduce size over time

15 of 19

Where will We Converge?

15

  • Random initialization
  • Multiple trials

Convex

Any local minimum is a global minimum

Non-convex

Multiple local minima may exist

16 of 19

Gradient Descent in High Dimension

16

17 of 19

Gradient Descent in High Dimension

17

18 of 19

Practically Solving Optimization Problems

  • The good news: for many classes of optimization problems, people have already done all the “hard work” of developing numerical algorithms
    • A wide range of tools that can take optimization problems in “natural” forms and compute a solution

  • Gradient descent
    • Easy to implement
    • Very general, can be applied to any differentiable loss functions
    • Requires less memory and computations (for stochastic methods)
    • Neural networks/deep learning
    • TensorFlow

18

19 of 19

Gradient Descent

  • Update rule:

19