MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

What is the computational cost of a standard convolution with input hi×wi×dih_i \times w_i \times d_i, output hi×wi×djh_i \times w_i \times d_j and kernel size kk?


hi×wi×di×dj×k×kh_i \times w_i \times d_i \times d_j \times k \times k

What is the computational cost of a depthwise separabale convolution with input hi×wi×dih_i \times w_i \times d_i, output hi×wi×djh_i \times w_i \times d_j and kernel size kk?


hiwidi(k2+dj)h_i \cdot w_i \cdot d_i (k^2 + d_j)

How much reduction in computation do you get by replacing standard convolutions with depthwise separable convolutions?


Computational cost standard convolution:

hi×wi×di×dj×k×kh_i \times w_i \times d_i \times d_j \times k \times k

Computational cost depthwise separable convolution: hiwidi(k2+dj)h_i \cdot w_i \cdot d_i (k^2 + d_j)

with input hi×wi×dih_i \times w_i \times d_i, output hi×wi×djh_i \times w_i \times d_j and kernel size kk

Reduction:

=hiwidi(k2+dj)hi×wi×di×dj×k×k= \frac{h_i \cdot w_i \cdot d_i (k^2 + d_j)}{h_i \times w_i \times d_i \times d_j \times k \times k}

=1dj+1k2= \frac{1}{d_j} + \frac{1}{k^2}

MobileNet uses 3×33 \times 3 depthwise separable convolutions which uses between 8 to 9 times less computation than standard convolutions.

With what does MobileNet replace the standard convolutional layer with batchnorm and ReLU? paste-30e8601affa2d6483ff09e78c38f234cdfcdaaee.jpg


With a Depthwise Separable convolutions with Depthwise and Pointwise layers followed by batchnorm and ReLU. paste-5e4779b76090a08c9ae193dbb7b515d911d52821.jpg

Which two additional hyperparameters are introduced in **MobileNet **to construct scaled versions of the standard architecture?


Width multiplier: The role of the width multiplier α\alpha is to thin a network uniformly at each layer. For a given layer and width multiplier α\alpha, the number of input channels did_i becomes αdi\alpha d_i and the number of output channels djd_j becomes αdj\alpha d_j.

Width multiplier has the effect of reducing computational cost and the number of parameters quadratically by roughly α2\alpha^2.

Resolution multiplier:

The resolution multiplier ρ\rho is applied to the input image and the internal representation of every layer is subsequently reduced by the same multiplier. In practice we implicitly set ρ\rho by setting the input resolution.

Resolution multiplier has the effect of reducing computational cost by ρ2\rho^2.* *

How many multiply-adds and parameters does the default MobileNetV1 have? And how much accuracy does it get on ImageNet?


569M MAdds and 4.2M parameters with an accuracy of 70.6% on ImageNet.

Machine Learning Research Flashcards is a collection of flashcards associated with scientific research papers in the field of machine learning. Best used with Anki or Obsidian. Edit MLRF on GitHub.