• It helps to reduce the internal covariate shift (ICS) so the distribution of inputs to the activations remains more stable.
  • BN makes us less careful about the scale of the parameters and their initialization.
  • It allows us to…

  • But now the question comes to mind why are we using this technique?
  • What are the benefits of using such techniques in the neural architecture we build?

What drives the AlphaFold to battle for a 50 year’s old grand challenge of biology?

A mathematical explanation of optimization of the linearly separable classifier using quadratic programming.

Ajinkya Jadhav

Machine Learning and Deep Learning Practitioner

