**Sigmoid unit :**

**Tanh unit:**

**Rectified linear unit (ReLU):**

we call;

- as
**stepped sigmoid**

- as
**softplus**function

The **softplus** function can be approximated by **max function** (**or hard max **) ie . The max function is commonly known as **Rectified Linear Function (ReL).**

In the following figure below we see different activation functions plotted.

The major differences between the sigmoid and ReL functions are:

**Sigmoid**function has a range [0,1] whereas**ReL**function has a range . Due to its range,**sigmoid**can be used to model probability hence, it is commonly used for regression or probability estimation at the last layer even when you use ReL for the previous layers.**NERD NOTE:**The view of**softplus**function is approximation of stepped sigmoid units relates to the binomial hidden units as discussed in http://machinelearning.wustl.edu... - The gradient of
**sigmoid**function vanishes as x recedes from 0 so basically it is called "saturated" at this point. However, the gradient of ReL function is such problem free due to its unbounded and linear positive part.

The advantages of using Rectified Linear Units in Neural Networks are;

- If hard max is used, it induces sparsity on the layer activations.
- As discussed earlier ReLU doesn't face gradient vanishing problem. Therefore, it allows training deeper networks without pre-training.
- ReLU can be used in Restricted Boltzmann machine to model real/integer valued inputs.

References :

- On Rectified Linear Units for Speech Processing http://www.cs.toronto.edu
/~hinto... - Rectifier Nonlinearities Improve Neural Network Acoustic Models http://ai.stanford.edu/~a
maas/pa... - Deep Sparse Rectifier Neural Networks http://eprints.pascal-net
work.or...