Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

When the minibatch size is multiplied by $k$ , how should the learning rate be scaled?

Multiply the learning rate by $k$ (Linear Scaling Rule)

What is the technique used to train large minibatch SGD.

Linear scaling rule.