一种用于压缩深度神经网络的新型低比特量化策略。

A Novel Low-Bit Quantization Strategy for Compressing Deep Neural Networks.

机构信息

College of Systems Engineering, National University of Defense Technology, Changsha 410073, China.

College of Computer, National University of Defense Technology, Changsha 410073, China.

出版信息

Comput Intell Neurosci. 2020 Feb 18;2020:7839064. doi: 10.1155/2020/7839064. eCollection 2020.

DOI:10.1155/2020/7839064

PMID:32148472

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7049432/

Abstract

The increase in sophistication of neural network models in recent years has exponentially expanded memory consumption and computational cost, thereby hindering their applications on ASIC, FPGA, and other mobile devices. Therefore, compressing and accelerating the neural networks are necessary. In this study, we introduce a novel strategy to train low-bit networks with weights and activations quantized by several bits and address two corresponding fundamental issues. One is to approximate activations through low-bit discretization for decreasing network computational cost and dot-product memory. The other is to specify weight quantization and update mechanism for discrete weights to avoid gradient mismatch. With quantized low-bit weights and activations, the costly full-precision operation will be replaced by shift operation. We evaluate the proposed method on common datasets, and results show that this method can dramatically compress the neural network with slight accuracy loss.

摘要

近年来，神经网络模型的复杂性不断提高，导致内存消耗和计算成本呈指数级增长，从而阻碍了它们在 ASIC、FPGA 和其他移动设备上的应用。因此，需要对神经网络进行压缩和加速。在本研究中，我们提出了一种新的策略，用于训练权重和激活量化为几个比特的低比特网络，并解决了两个相应的基本问题。一个是通过低比特离散化来近似激活，以降低网络计算成本和点积内存。另一个是指定离散权重的量化和更新机制，以避免梯度失配。通过量化的低比特权重和激活，昂贵的全精度运算将被移位运算所取代。我们在常见数据集上评估了所提出的方法，结果表明，该方法可以在略微降低精度的情况下显著压缩神经网络。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efa2/7049432/58ca85e06127/CIN2020-7839064.001.jpg

相似文献

A Novel Low-Bit Quantization Strategy for Compressing Deep Neural Networks.一种用于压缩深度神经网络的新型低比特量化策略。

Comput Intell Neurosci. 2020 Feb 18;2020:7839064. doi: 10.1155/2020/7839064. eCollection 2020.

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design.利用基于再训练的混合精度量化进行低成本深度神经网络加速器设计。

IEEE Trans Neural Netw Learn Syst. 2021 Jul;32(7):2925-2938. doi: 10.1109/TNNLS.2020.3008996. Epub 2021 Jul 6.

Training high-performance and large-scale deep neural networks with full 8-bit integers.用全 8 位整数训练高性能和大规模深度神经网络。

Neural Netw. 2020 May;125:70-82. doi: 10.1016/j.neunet.2019.12.027. Epub 2020 Jan 15.

A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation.一种面向硬件的 CNN 低比特数 2 的幂量化方法及其 FPGA 实现。

Sensors (Basel). 2022 Sep 1;22(17):6618. doi: 10.3390/s22176618.

QTTNet: Quantized tensor train neural networks for 3D object and video recognition.QTTNet：用于3D物体和视频识别的量化张量列车神经网络

Neural Netw. 2021 Sep;141:420-432. doi: 10.1016/j.neunet.2021.05.034. Epub 2021 Jun 5.

GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework.GXNOR-Net：在统一的离散化框架下使用三进制权重和激活函数训练深度神经网络，无需全精度存储。

Neural Netw. 2018 Apr;100:49-58. doi: 10.1016/j.neunet.2018.01.010. Epub 2018 Feb 2.

Deep Network Quantization via Error Compensation.通过误差补偿实现深度网络量化

IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4960-4970. doi: 10.1109/TNNLS.2021.3064293. Epub 2022 Aug 31.

GradFreeBits: Gradient-Free Bit Allocation for Mixed-Precision Neural Networks.无梯度比特分配在混合精度神经网络中的应用。

Sensors (Basel). 2022 Dec 13;22(24):9772. doi: 10.3390/s22249772.

Convolutional Neural Networks Quantization with Double-Stage Squeeze-and-Threshold.卷积神经网络的双阶段压缩-门限量化方法。

Int J Neural Syst. 2022 Dec;32(12):2250051. doi: 10.1142/S0129065722500514. Epub 2022 Sep 26.

Vertical Layering of Quantized Neural Networks for Heterogeneous Inference.用于异构推理的量化神经网络的垂直分层

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15964-15978. doi: 10.1109/TPAMI.2023.3319045. Epub 2023 Nov 3.

引用本文的文献

Differentiable Network Pruning via Polarization of Probabilistic Channelwise Soft Masks.基于概率通道软掩模极化的可微分网络剪枝。

Comput Intell Neurosci. 2022 May 5;2022:7775419. doi: 10.1155/2022/7775419. eCollection 2022.

Whether the Support Region of Three-Bit Uniform Quantizer Has a Strong Impact on Post-Training Quantization for MNIST Dataset?三位均匀量化器的支持区域对MNIST数据集的训练后量化有很大影响吗？

Entropy (Basel). 2021 Dec 20;23(12):1699. doi: 10.3390/e23121699.

Evaluation of Deep Neural Network Compression Methods for Edge Devices Using Weighted Score-Based Ranking Scheme.基于加权得分排序方案的边缘设备中深度神经网络压缩方法的评估。

Sensors (Basel). 2021 Nov 12;21(22):7529. doi: 10.3390/s21227529.

本文引用的文献

Neural Netw. 2018 Apr;100:49-58. doi: 10.1016/j.neunet.2018.01.010. Epub 2018 Feb 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于压缩深度神经网络的新型低比特量化策略。

A Novel Low-Bit Quantization Strategy for Compressing Deep Neural Networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献