##### CATEGORICAL REPARAMETERIZATION WITH GUMBEL SOFTMAX

- Link: https://arxiv.org/pdf/1611.01144v1.pdf
- Continuous distribution on the simplex which approximates discrete vectors (one hot vectors) and differentiable by its parameters with reparametrization trick used in VAE.
- It is used for semi-supervised learning.

##### DEEP UNSUPERVISED LEARNING WITH SPATIAL CONTRASTING

- Learning useful unsupervised image representations by using triplet loss on image patches. The triplet is defined by two image patches from the same images as the anchor and the positive instances and a patch from a different image which is the negative. It gives a good boost on CIFAR-10 after using it as a pretraning method.
- How would you apply to real and large scale classification problem?

##### UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION

##### MULTI-RESIDUAL NETWORKS

- For 110-layers ResNet the most contribution to gradient updates come from the paths with 10-34 layers.
- ResNet trained with only these effective paths has comparable performance with the full ResNet. It is done by sampling paths with lengths in the effective range for each mini-batch.
- Instead of going deeper adding more residual connections provides more boost due to the notion of exponential ensemble of shallow networks by the residual connections.
- Removing a residual block from a ResNet has negligible drop on performance in test time in contrast to VGG and GoogleNet.