• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于黑塞矩阵的神经网络混合精度量化与过渡感知训练

Hessian-based mixed-precision quantization with transition aware training for neural networks.

作者信息

Huang Zhiyong, Han Xiao, Yu Zhi, Zhao Yunlan, Hou Mingyang, Hu Shengdong

机构信息

School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China.

School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China.

出版信息

Neural Netw. 2025 Feb;182:106910. doi: 10.1016/j.neunet.2024.106910. Epub 2024 Nov 16.

DOI:10.1016/j.neunet.2024.106910
PMID:39579751
Abstract

Model quantization is widely used to realize the promise of ubiquitous embedded deep network inference. While mixed-precision quantization has shown promising performance, existing approaches often rely on time-consuming search process to determine the optimal bit configuration. To address this, we introduce Hessian-based Mixed-Precision Quantization Aware Training(HMQAT) to decrease the search overhead of bit configuration. By using the sensitivity metric of joint average Hessian trace and parameter size, HMQAT effectively guides the search process. We solve the optimization problem about bit configuration automatically using a Pareto frontier method. The lowest search overhead can be achieved through our scheme. Additionally, our approach incorporates quantization transition aware fine-tuning of scale factor. This strategy consistently ensures optimal inference performance along the accuracy-size Pareto frontier across multiple models. We extensively evaluated our method on ImageNet and CIFAR10. In particular, we show that compared to the baseline, HMQAT achieves a 10.34× reduction in model size while retaining 99.81% of the Top-1 accuracy on ResNet18 for ImageNet. Moreover, HMQAT surpassed the state-of-the-art mixed-precision quantization methods in compressing neural networks with reduced search cost and achieving a satisfying trade-off between size and accuracy. This study paves the way of deploying neural networks on lightweight devices.

摘要

模型量化被广泛用于实现无处不在的嵌入式深度网络推理的前景。虽然混合精度量化已展现出有前景的性能,但现有方法通常依赖耗时的搜索过程来确定最优比特配置。为解决此问题,我们引入基于海森矩阵的混合精度量化感知训练(HMQAT)以减少比特配置的搜索开销。通过使用联合平均海森矩阵迹和参数大小的敏感度度量,HMQAT有效地引导搜索过程。我们使用帕累托前沿方法自动解决关于比特配置的优化问题。通过我们的方案可实现最低的搜索开销。此外,我们的方法纳入了量化过渡感知的比例因子微调。此策略始终确保跨多个模型在精度 - 大小帕累托前沿上具有最优推理性能。我们在ImageNet和CIFAR10上广泛评估了我们的方法。特别是,我们表明与基线相比,HMQAT在ImageNet的ResNet18上模型大小减少了10.34倍,同时保留了99.81%的Top - 1准确率。此外,HMQAT在降低搜索成本压缩神经网络以及在大小和精度之间实现令人满意的权衡方面超越了当前最先进的混合精度量化方法。这项研究为在轻量级设备上部署神经网络铺平了道路。

相似文献

1
Hessian-based mixed-precision quantization with transition aware training for neural networks.基于黑塞矩阵的神经网络混合精度量化与过渡感知训练
Neural Netw. 2025 Feb;182:106910. doi: 10.1016/j.neunet.2024.106910. Epub 2024 Nov 16.
2
GradFreeBits: Gradient-Free Bit Allocation for Mixed-Precision Neural Networks.无梯度比特分配在混合精度神经网络中的应用。
Sensors (Basel). 2022 Dec 13;22(24):9772. doi: 10.3390/s22249772.
3
Deployable mixed-precision quantization with co-learning and one-time search.采用协同学习和一次性搜索的可部署混合精度量化
Neural Netw. 2025 Jan;181:106812. doi: 10.1016/j.neunet.2024.106812. Epub 2024 Oct 18.
4
Training high-performance and large-scale deep neural networks with full 8-bit integers.用全 8 位整数训练高性能和大规模深度神经网络。
Neural Netw. 2020 May;125:70-82. doi: 10.1016/j.neunet.2019.12.027. Epub 2020 Jan 15.
5
Single-Path Bit Sharing for Automatic Loss-Aware Model Compression.用于自动损失感知模型压缩的单路径位共享
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12459-12473. doi: 10.1109/TPAMI.2023.3275159. Epub 2023 Sep 5.
6
Quantization Friendly MobileNet (QF-MobileNet) Architecture for Vision Based Applications on Embedded Platforms.面向嵌入式平台视觉应用的量化友好型 MobileNet(QF-MobileNet)架构。
Neural Netw. 2021 Apr;136:28-39. doi: 10.1016/j.neunet.2020.12.022. Epub 2020 Dec 29.
7
SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression.SensiMix:用于 BERT 压缩的敏感度感知 8 位索引和 1 位值混合精度量化。
PLoS One. 2022 Apr 18;17(4):e0265621. doi: 10.1371/journal.pone.0265621. eCollection 2022.
8
ADFQ-ViT: Activation-Distribution-Friendly post-training Quantization for Vision Transformers.
Neural Netw. 2025 Jun;186:107289. doi: 10.1016/j.neunet.2025.107289. Epub 2025 Feb 22.
9
Vertical Layering of Quantized Neural Networks for Heterogeneous Inference.用于异构推理的量化神经网络的垂直分层
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15964-15978. doi: 10.1109/TPAMI.2023.3319045. Epub 2023 Nov 3.
10
Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design.利用基于再训练的混合精度量化进行低成本深度神经网络加速器设计。
IEEE Trans Neural Netw Learn Syst. 2021 Jul;32(7):2925-2938. doi: 10.1109/TNNLS.2020.3008996. Epub 2021 Jul 6.