基于黑塞矩阵的神经网络混合精度量化与过渡感知训练

Hessian-based mixed-precision quantization with transition aware training for neural networks.

作者信息

Huang Zhiyong, Han Xiao, Yu Zhi, Zhao Yunlan, Hou Mingyang, Hu Shengdong

机构信息

School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China.

出版信息

Neural Netw. 2025 Feb;182:106910. doi: 10.1016/j.neunet.2024.106910. Epub 2024 Nov 16.

DOI:10.1016/j.neunet.2024.106910

PMID:39579751

Abstract

Model quantization is widely used to realize the promise of ubiquitous embedded deep network inference. While mixed-precision quantization has shown promising performance, existing approaches often rely on time-consuming search process to determine the optimal bit configuration. To address this, we introduce Hessian-based Mixed-Precision Quantization Aware Training(HMQAT) to decrease the search overhead of bit configuration. By using the sensitivity metric of joint average Hessian trace and parameter size, HMQAT effectively guides the search process. We solve the optimization problem about bit configuration automatically using a Pareto frontier method. The lowest search overhead can be achieved through our scheme. Additionally, our approach incorporates quantization transition aware fine-tuning of scale factor. This strategy consistently ensures optimal inference performance along the accuracy-size Pareto frontier across multiple models. We extensively evaluated our method on ImageNet and CIFAR10. In particular, we show that compared to the baseline, HMQAT achieves a 10.34× reduction in model size while retaining 99.81% of the Top-1 accuracy on ResNet18 for ImageNet. Moreover, HMQAT surpassed the state-of-the-art mixed-precision quantization methods in compressing neural networks with reduced search cost and achieving a satisfying trade-off between size and accuracy. This study paves the way of deploying neural networks on lightweight devices.

摘要

模型量化被广泛用于实现无处不在的嵌入式深度网络推理的前景。虽然混合精度量化已展现出有前景的性能，但现有方法通常依赖耗时的搜索过程来确定最优比特配置。为解决此问题，我们引入基于海森矩阵的混合精度量化感知训练（HMQAT）以减少比特配置的搜索开销。通过使用联合平均海森矩阵迹和参数大小的敏感度度量，HMQAT有效地引导搜索过程。我们使用帕累托前沿方法自动解决关于比特配置的优化问题。通过我们的方案可实现最低的搜索开销。此外，我们的方法纳入了量化过渡感知的比例因子微调。此策略始终确保跨多个模型在精度 - 大小帕累托前沿上具有最优推理性能。我们在ImageNet和CIFAR10上广泛评估了我们的方法。特别是，我们表明与基线相比，HMQAT在ImageNet的ResNet18上模型大小减少了10.34倍，同时保留了99.81%的Top - 1准确率。此外，HMQAT在降低搜索成本压缩神经网络以及在大小和精度之间实现令人满意的权衡方面超越了当前最先进的混合精度量化方法。这项研究为在轻量级设备上部署神经网络铺平了道路。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于黑塞矩阵的神经网络混合精度量化与过渡感知训练

Hessian-based mixed-precision quantization with transition aware training for neural networks.

作者信息

机构信息

出版信息

相似文献

基于黑塞矩阵的神经网络混合精度量化与过渡感知训练

Hessian-based mixed-precision quantization with transition aware training for neural networks.

作者信息

机构信息

出版信息

相似文献