Activation Functions
Introduction
Figure: Single Neural Network Architecture
Need of Activation Functions
Binary Step Function
Sigmoid Function
Derivative of Sigmoid Function
f'(x) = sigmoid(x)*(1-sigmoid(x))
Tanh Function (Hyperbolic Tangent)
Derivative of tanh Function
Relu Activation Function
Dying Relu Problem
Leaky ReLU Function
Derivative of Leaky ReLU Function
Parametric ReLU Function
Where "a" is the slope parameter for negative values.
The parameterized ReLU function is used when the leaky ReLU function still fails at solving the problem of dead neurons, and the relevant information is not successfully passed to the next layer. �
This function’s limitation is that it may perform differently for different problems depending upon the value of slope parameter a.
Derivative
Exponential Linear Units(ELUs)
ELU is a strong alternative for f ReLU because of the following advantages:�
Derivative
Softmax Function
Let’s understand this with an example. Let’s say the models (such as those trained using algorithms such as multi-class LDA, and multinomial logistic regression) output three different values such as 5.0, 2.5, and 0.5 for a particular input. In order to convert these numbers into probabilities, these numbers are fed into the ure.softmax function as shown in fig.
Swish Activation Function
Derivative
Gaussian Error Linear Unit (GELU)
Derivative
Scaled Exponential Linear Unit (SELU)
Here’s the main advantage of SELU over ReLU:�
Derivative
Summary