Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
When the minibatch size is multiplied by , how should the learning rate be scaled?
Multiply the learning rate by (Linear Scaling Rule)
What is the technique used to train large minibatch SGD.
Linear scaling rule.