Suppr超能文献

利用基于再训练的混合精度量化进行低成本深度神经网络加速器设计。

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design.

作者信息

Kim Nahsung, Shin Dongyeob, Choi Wonseok, Kim Geonho, Park Jongsun

出版信息

IEEE Trans Neural Netw Learn Syst. 2021 Jul;32(7):2925-2938. doi: 10.1109/TNNLS.2020.3008996. Epub 2021 Jul 6.

Abstract

For successful deployment of deep neural networks (DNNs) on resource-constrained devices, retraining-based quantization has been widely adopted to reduce the number of DRAM accesses. By properly setting training parameters, such as batch size and learning rate, bit widths of both weights and activations can be uniformly quantized down to 4 bit while maintaining full precision accuracy. In this article, we present a retraining-based mixed-precision quantization approach and its customized DNN accelerator to achieve high energy efficiency. In the proposed quantization, in the middle of retraining, an additional bit (extra quantization level) is assigned to the weights that have shown frequent switching between two contiguous quantization levels since it means that both quantization levels cannot help to reduce quantization loss. We also mitigate the gradient noise that occurs in the retraining process by taking a lower learning rate near the quantization threshold. For the proposed novel mixed-precision quantized network (MPQ-network), we have implemented a customized accelerator using a 65-nm CMOS process. In the accelerator, the proposed processing elements (PEs) can be dynamically reconfigured to process variable bit widths from 2 to 4 bit for both weights and activations. The numerical results show that the proposed quantization can achieve 1.37 × better compression ratio for VGG-9 using CIFAR-10 data set compared with a uniform 4-bit (both weights and activations) model without loss of classification accuracy. The proposed accelerator also shows 1.29× of energy savings for VGG-9 using the CIFAR-10 data set over the state-of-the-art accelerator.

摘要

为了在资源受限的设备上成功部署深度神经网络(DNN),基于再训练的量化方法已被广泛采用,以减少DRAM访问次数。通过合理设置训练参数,如批量大小和学习率,权重和激活值的位宽可以统一量化到4位,同时保持全精度的准确性。在本文中,我们提出了一种基于再训练的混合精度量化方法及其定制的DNN加速器,以实现高能效。在所提出的量化方法中,在再训练过程中,对于那些在两个相邻量化级别之间频繁切换的权重,会额外分配一位(额外的量化级别),因为这意味着这两个量化级别都无法有效减少量化损失。我们还通过在量化阈值附近采用较低的学习率来减轻再训练过程中出现的梯度噪声。对于所提出的新型混合精度量化网络(MPQ网络),我们使用65纳米CMOS工艺实现了一个定制的加速器。在该加速器中,所提出的处理元件(PE)可以动态重新配置,以处理权重和激活值从2位到4位的可变位宽。数值结果表明,与均匀4位(权重和激活值均为4位)模型相比,在所提出的量化方法下,使用CIFAR-10数据集的VGG-9网络可实现1.37倍更高的压缩率,且不损失分类准确率。在所提出的加速器上,使用CIFAR-10数据集的VGG-9网络相比于最先进的加速器还能节省1.29倍的能量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验