Introduction

CNN are broadly employed in the computer vision, NLP and others areas. However, many redudance exists in the network. In other word, one could leverage less computing resources to finish the task without accuracy loss. Many publications come out for the simplification. One of them is model quantization.

Quantization means to regress the activation or weight to several discrete set of number. Generally, quantization implies to fix point data type. In the publications, some focus on quantization on the weights, some others focus on the quantization on activation and also focus on both. Image classification is the most usual application to verify the algorithm, other scenarios such as detection, tracking, segmentation and even super resolution also attract attentions. Here summarize one of the papers.

Other related papers could be found in model-compression-summary

PACT: Paremeterized clipping activation for quantized neural networks

Accurate and efficient 2-bit quantized neural networks (PACT + SAWB)

Learning to quantize deep networks by optimizing quantion intervals with task loss (QIL)

Highlight

These three papers are similiar in the idea of quantization. For quantization by Dorefa, the accuray drop is big since the quantization range is large. The distribution of data varies different for different. Dorefa quantize them in equal interval(step), which doesn’t make full use of the represent capbility. Thus, Pact shrink the quantization range by learn a dynamic range layerwise. Before PACT, researchers get the range by KL devergence or just multiple a scale (for example, 0.8) to the abs max. PACT learning the upper bound of the activation. It is similar with ReLU6, but the upper bound is learned by training. QIL learning range threshold, not just the upper bound but also the lower bound.

SAWB (statistic aware weight binning) assign the quantization scale by a regression of the first and second momumment.

Even thought the idea seems to be simple, experiement results show they are very efficient and the accuracy drop is low compared with other methods.

TAGS: paper model-compression ai quantization

« Optimize GEMM « Homepage » ALQnet Training History »