Activation function + vanishing gradient problem
Andreas Baum
Sigmoid
Problems:
https://towardsdatascience.com/activation-functions-and-its-types-which-is-better-a9a5310cc8f
Tanh
f(x) = 1 — exp(-2x) / 1 + exp(-2x).
Problems:
https://towardsdatascience.com/activation-functions-and-its-types-which-is-better-a9a5310cc8f
Vanishing gradient problem
Vanishing gradient problem
Vanishing gradient problem
Vanishing gradient problem
https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484
Vanishing gradient problem
Values range from 0 to 0.25
[0, 0.25] [0, 0.25]
Vanishing gradient problem
= [0, 0.25] * [0, 0.25] * [0, 0.25] * [0, 0.25] * [0, 0.25]
Vanishing gradient problem
= [0, 0.25] * [0, 0.25] * [0, 0.25] * [0, 0.25] * [0, 0.25]
0.00002754 = 0.12* 0.25 * 0.06 * 0.17 *0.09
Vanishing gradient problem
= [0, 0.25] * [0, 0.25] * [0, 0.25] * [0, 0.25] * [0, 0.25]
0.00002754 = 0.12* 0.25 * 0.06 * 0.17 *0.09
0.4 - 0.1* 0.00002754 = 3.999997246
Vanishing gradient solutions - Batch normalization
https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484
ReLU
Problems:
ReLU
Problems:
Softmax
Classification - computes the probabilites