• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于优化的带位分割与拼接的训练后量化

Optimization-Based Post-Training Quantization With Bit-Split and Stitching.

作者信息

Wang Peisong, Chen Weihan, He Xiangyu, Chen Qiang, Liu Qingshan, Cheng Jian

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2119-2135. doi: 10.1109/TPAMI.2022.3159369. Epub 2023 Jan 6.

DOI:10.1109/TPAMI.2022.3159369
PMID:35290185
Abstract

Deep neural networks have shown great promise in various domains. Meanwhile, problems including the storage and computing overheads arise along with these breakthroughs. To solve these problems, network quantization has received increasing attention due to its high efficiency and hardware-friendly property. Nonetheless, most existing quantization approaches rely on the full training dataset and the time-consuming fine-tuning process to retain accuracy. Post-training quantization does not have these problems, however, it has mainly been shown effective for 8-bit quantization. In this paper, we theoretically analyze the effect of network quantization and show that the quantization loss in the final output layer is bounded by the layer-wise activation reconstruction error. Based on this analysis, we propose an Optimization-based Post-training Quantization framework and a novel Bit-split optimization approach to achieve minimal accuracy degradation. The proposed framework is validated on a variety of computer vision tasks, including image classification, object detection, instance segmentation, with various network architectures. Specifically, we achieve near-original model performance even when quantizing FP32 models to 3-bit without fine-tuning.

摘要

深度神经网络在各个领域都展现出了巨大的潜力。与此同时,随着这些突破也出现了包括存储和计算开销在内的问题。为了解决这些问题,网络量化因其高效性和硬件友好性而受到越来越多的关注。尽管如此,大多数现有的量化方法依赖于完整的训练数据集和耗时的微调过程来保持精度。训练后量化不存在这些问题,然而,它主要在8位量化中显示出有效性。在本文中,我们从理论上分析了网络量化的效果,并表明最终输出层的量化损失受逐层激活重建误差的限制。基于这一分析,我们提出了一种基于优化的训练后量化框架和一种新颖的位拆分优化方法,以实现最小的精度下降。所提出的框架在包括图像分类、目标检测、实例分割等各种计算机视觉任务以及各种网络架构上得到了验证。具体而言,即使在不进行微调的情况下将FP32模型量化到3位,我们也能实现接近原始模型的性能。

相似文献

1
Optimization-Based Post-Training Quantization With Bit-Split and Stitching.基于优化的带位分割与拼接的训练后量化
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2119-2135. doi: 10.1109/TPAMI.2022.3159369. Epub 2023 Jan 6.
2
EfficientQ: An efficient and accurate post-training neural network quantization method for medical image segmentation.高效 Q:一种用于医学图像分割的高效准确的神经网络后训练量化方法。
Med Image Anal. 2024 Oct;97:103277. doi: 10.1016/j.media.2024.103277. Epub 2024 Jul 22.
3
Training high-performance and large-scale deep neural networks with full 8-bit integers.用全 8 位整数训练高性能和大规模深度神经网络。
Neural Netw. 2020 May;125:70-82. doi: 10.1016/j.neunet.2019.12.027. Epub 2020 Jan 15.
4
Whether the Support Region of Three-Bit Uniform Quantizer Has a Strong Impact on Post-Training Quantization for MNIST Dataset?三位均匀量化器的支持区域对MNIST数据集的训练后量化有很大影响吗?
Entropy (Basel). 2021 Dec 20;23(12):1699. doi: 10.3390/e23121699.
5
Quantization Friendly MobileNet (QF-MobileNet) Architecture for Vision Based Applications on Embedded Platforms.面向嵌入式平台视觉应用的量化友好型 MobileNet(QF-MobileNet)架构。
Neural Netw. 2021 Apr;136:28-39. doi: 10.1016/j.neunet.2020.12.022. Epub 2020 Dec 29.
6
Adaptive Global Power-of-Two Ternary Quantization Algorithm Based on Unfixed Boundary Thresholds.基于非固定边界阈值的自适应全局二的幂次三值量化算法
Sensors (Basel). 2023 Dec 28;24(1):181. doi: 10.3390/s24010181.
7
A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation.一种面向硬件的 CNN 低比特数 2 的幂量化方法及其 FPGA 实现。
Sensors (Basel). 2022 Sep 1;22(17):6618. doi: 10.3390/s22176618.
8
SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression.SensiMix:用于 BERT 压缩的敏感度感知 8 位索引和 1 位值混合精度量化。
PLoS One. 2022 Apr 18;17(4):e0265621. doi: 10.1371/journal.pone.0265621. eCollection 2022.
9
MedQ: Lossless ultra-low-bit neural network quantization for medical image segmentation.MedQ:用于医学图像分割的无损超低比特神经网络量化。
Med Image Anal. 2021 Oct;73:102200. doi: 10.1016/j.media.2021.102200. Epub 2021 Aug 2.
10
Fast-SNN: Fast Spiking Neural Network by Converting Quantized ANN.快速脉冲神经网络:通过量化人工神经网络转换实现的快速脉冲神经网络
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14546-14562. doi: 10.1109/TPAMI.2023.3275769. Epub 2023 Nov 3.