Recent Advances in Deep Learning

In this text, I would like to talk about some of the recent advances of Deep Learning models by no means complete. (Click heading for the reference)

  1. Parametric Rectifier Linear Unit (PReLU)
    • The idea is to allow negative activation in well-known ReLU units by controlling it with a learnable parameter. In other words, you learn how much negative activationsyou need for each unit to discriminate classes. In the work, it is proposed that PReLU unit is very useful for especially very deep models that lacks for gradient propagation to initial layers due to its depth. What is different is PReLU allows more gradient return by allowing negative activation.PReLU
  2. A new initialization method (MSRA for Caffe users)
    • Xavier initialization was proposed by Bengio's team and it considers number of fan-in and fan-out to a certain unit to define the initial weights.  However, the work says that Xavier method and its alternations considers linear activation functions for the formulation of the method. Hence, they propose some changes related to ReLU activation that they empirically proved its effect in practice with better convergence rate.
  3. Batch Normalization 
    • This work serves data normalization as a structural part of the model. They say that the distribution of the training data changes as the model evolves and it priorities the initialization scheme and the learning schedule we use for the learning. Each mini-batch of the data is normalized with the described scheme just before its propagation through the network and it allows faster convergence  with larger learning rates and robust models to initialization scheme that we choose.  Each mini-batch is normalized by its mean and variance, then it is scaled and shifted by a learned coefficient and residual.

      From the paper
      From the paper
  4. Inception Layers
    • This is one of the ingredients of last year's ImageNet winner GoogleNet. The trick is to use multi-scale filters all together in a layer and concatenating their responses for the next layer. In that way we are able to learn difference covariances per each layer by different sizes and structures. inception_module