Introduction

CNN are broadly employed in the computer vision, NLP and others areas. However, many redudance exists in the network. In other word, one could leverage less computing resources to finish the task without accuracy loss. Many publications come out for the simplification. One of them is structure research.

Mobilenet v1/v2 is a publication from Google. It designed a new network architecture with less parameters and operations but archive similar accuracy with famous bone networks, such as VGG, Resnet. It now has became a new baseline for network simplifaction. The main idea is to replace normal 3*3 conv with combination of depth-wise seperate conv and point conv. Mobilenet v2 sovles the degradation problem of mobilenet v1.

Mobilenet v2

Paper Link

Goto Mobilenet v1?

Code

One of the great work with pytorch framework is here.

problem with mobilenet v1 and mobilenet v2’s explain and solution

The depthwise-conv is easily to get zero gradient during training, so call ‘degradation’ problem. Mobilenet v2 gives some explain and solution.
More comparason could be found in this discussion.

Paper content

For activations on the CNN network, researchers think the useful information could be expressed by less data of the activation. In other word, normal activation contained redudent message, including the desired abstraction. Based on this belief, the authors of mobilenet-v2 shrink the dimension to reduce redudent to maniford of the useful information. Others also have a try before but turn out no so good result. The authors of mobilenet-v2 think this maybe caused by the non-lineaer layer. Non-linear layers are good and neccessary to sparse the information to abstract what we want, however they aslo lead to the loss of information (some of which may be important). Mobilenet-v2’s solution is the linear bottleneck and revert residual structure.

Linear Bottlenecks

The non-linear effect on activation could be show in the following figure.
A01
Non-linear could filter some information. If keep enough channel, the useful information could be still keeped after non-linear. Otherwise, the ouput might not map back.

Inverted Residual Block

The classical residual block is: thick-thin-thick. The reverted residual block is: thin-thick-thin. When no thick between the thin, it becomes identify block.
A02

They use a hyperparameter to control the thickness of the block – the expansion rate. It’s the channel multiler against the thin input activation. The paper gives some advise of the chosen of the parameter.