[C78] MSQ: Memory-Efficient Bit Sparsification Quantization

Abstract

As deep neural networks(DNNs) see increased deployment on mobile and edge devices, optimizing model efficiency has become crucial. Mixed-precision quantization is widely favored, as it offers a superior balance between efficiency and accuracy compared to uniform quantization. However, finding the optimal precision for each layer is challenging. Recent studies utilizing bit-level sparsity have shown promise, yet they often introduce substantial training complexity and high GPU memory requirements. In this paper, we propose Memory-Efficient Bit Sparsification Quantization (MSQ), a novel approach that addresses these limitations. MSQ applies a round-clamp quantizer for a differentiable least significant bit (LSB) computation from the model weight and leverages regularization to induce sparsity in LSBs, enabling effective precision reduction without splitting parameters at the bit level, thereby minimizing memory use and training time. Additionally, MSQ incorporates Hessian information, allowing the simultaneous pruning of multiple LSBs to further enhance training efficiency. Experimental results show that MSQ effectively reduces resource demands while maintaining competitive accuracy and compression rates, making it a practical solution for training efficient DNNs on resource-constrained devices.

Publication
International Conference on Computer Vision
Jin Hee Kim (김진희)
Jin Hee Kim (김진희)
PhD student in Duke University
Kang Eun Jeon (전강은)
Kang Eun Jeon (전강은)
Post-doctoral researcher