利用基于再训练的混合精度量化进行低成本深度神经网络加速器设计。

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design.

作者信息

Kim Nahsung, Shin Dongyeob, Choi Wonseok, Kim Geonho, Park Jongsun

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Jul;32(7):2925-2938. doi: 10.1109/TNNLS.2020.3008996. Epub 2021 Jul 6.

DOI:10.1109/TNNLS.2020.3008996

Abstract

For successful deployment of deep neural networks (DNNs) on resource-constrained devices, retraining-based quantization has been widely adopted to reduce the number of DRAM accesses. By properly setting training parameters, such as batch size and learning rate, bit widths of both weights and activations can be uniformly quantized down to 4 bit while maintaining full precision accuracy. In this article, we present a retraining-based mixed-precision quantization approach and its customized DNN accelerator to achieve high energy efficiency. In the proposed quantization, in the middle of retraining, an additional bit (extra quantization level) is assigned to the weights that have shown frequent switching between two contiguous quantization levels since it means that both quantization levels cannot help to reduce quantization loss. We also mitigate the gradient noise that occurs in the retraining process by taking a lower learning rate near the quantization threshold. For the proposed novel mixed-precision quantized network (MPQ-network), we have implemented a customized accelerator using a 65-nm CMOS process. In the accelerator, the proposed processing elements (PEs) can be dynamically reconfigured to process variable bit widths from 2 to 4 bit for both weights and activations. The numerical results show that the proposed quantization can achieve 1.37 × better compression ratio for VGG-9 using CIFAR-10 data set compared with a uniform 4-bit (both weights and activations) model without loss of classification accuracy. The proposed accelerator also shows 1.29× of energy savings for VGG-9 using the CIFAR-10 data set over the state-of-the-art accelerator.

摘要

为了在资源受限的设备上成功部署深度神经网络（DNN），基于再训练的量化方法已被广泛采用，以减少DRAM访问次数。通过合理设置训练参数，如批量大小和学习率，权重和激活值的位宽可以统一量化到4位，同时保持全精度的准确性。在本文中，我们提出了一种基于再训练的混合精度量化方法及其定制的DNN加速器，以实现高能效。在所提出的量化方法中，在再训练过程中，对于那些在两个相邻量化级别之间频繁切换的权重，会额外分配一位（额外的量化级别），因为这意味着这两个量化级别都无法有效减少量化损失。我们还通过在量化阈值附近采用较低的学习率来减轻再训练过程中出现的梯度噪声。对于所提出的新型混合精度量化网络（MPQ网络），我们使用65纳米CMOS工艺实现了一个定制的加速器。在该加速器中，所提出的处理元件（PE）可以动态重新配置，以处理权重和激活值从2位到4位的可变位宽。数值结果表明，与均匀4位（权重和激活值均为4位）模型相比，在所提出的量化方法下，使用CIFAR-10数据集的VGG-9网络可实现1.37倍更高的压缩率，且不损失分类准确率。在所提出的加速器上，使用CIFAR-10数据集的VGG-9网络相比于最先进的加速器还能节省1.29倍的能量。

相似文献

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design.利用基于再训练的混合精度量化进行低成本深度神经网络加速器设计。

IEEE Trans Neural Netw Learn Syst. 2021 Jul;32(7):2925-2938. doi: 10.1109/TNNLS.2020.3008996. Epub 2021 Jul 6.

Training high-performance and large-scale deep neural networks with full 8-bit integers.用全 8 位整数训练高性能和大规模深度神经网络。

Neural Netw. 2020 May;125:70-82. doi: 10.1016/j.neunet.2019.12.027. Epub 2020 Jan 15.

Unsupervised Network Quantization via Fixed-Point Factorization.通过定点分解实现无监督网络量化

IEEE Trans Neural Netw Learn Syst. 2021 Jun;32(6):2706-2720. doi: 10.1109/TNNLS.2020.3007749. Epub 2021 Jun 2.

Low Complexity Gradient Computation Techniques to Accelerate Deep Neural Network Training.加速深度神经网络训练的低复杂度梯度计算技术

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5745-5759. doi: 10.1109/TNNLS.2021.3130991. Epub 2023 Sep 1.

A Novel Low-Bit Quantization Strategy for Compressing Deep Neural Networks.一种用于压缩深度神经网络的新型低比特量化策略。

Comput Intell Neurosci. 2020 Feb 18;2020:7839064. doi: 10.1155/2020/7839064. eCollection 2020.

IVS-Caffe-Hardware-Oriented Neural Network Model Development.基于 IVS 硬件的面向神经网络模型开发。

IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5978-5992. doi: 10.1109/TNNLS.2021.3072145. Epub 2022 Oct 5.

A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation.一种面向硬件的 CNN 低比特数 2 的幂量化方法及其 FPGA 实现。

Sensors (Basel). 2022 Sep 1;22(17):6618. doi: 10.3390/s22176618.

Whether the Support Region of Three-Bit Uniform Quantizer Has a Strong Impact on Post-Training Quantization for MNIST Dataset?三位均匀量化器的支持区域对MNIST数据集的训练后量化有很大影响吗？

Entropy (Basel). 2021 Dec 20;23(12):1699. doi: 10.3390/e23121699.

GradFreeBits: Gradient-Free Bit Allocation for Mixed-Precision Neural Networks.无梯度比特分配在混合精度神经网络中的应用。

Sensors (Basel). 2022 Dec 13;22(24):9772. doi: 10.3390/s22249772.

SmartDeal: Remodeling Deep Network Weights for Efficient Inference and Training.智能交易：重塑深度网络权重以实现高效推理与训练

IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7099-7113. doi: 10.1109/TNNLS.2021.3138056. Epub 2023 Oct 5.

引用本文的文献

A lightweight intrusion detection method for IoT based on deep learning and dynamic quantization.一种基于深度学习和动态量化的物联网轻量级入侵检测方法。

PeerJ Comput Sci. 2023 Sep 22;9:e1569. doi: 10.7717/peerj-cs.1569. eCollection 2023.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用基于再训练的混合精度量化进行低成本深度神经网络加速器设计。

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献