Suppr超能文献

基于张量分解量化的深度神经网络压缩及其在可重构硬件平台上的实现。

Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platforms.

机构信息

University of Tehran, Iran.

出版信息

Neural Netw. 2022 Jun;150:350-363. doi: 10.1016/j.neunet.2022.02.024. Epub 2022 Mar 8.

Abstract

Deep Neural Networks (DNNs) have been vastly and successfully employed in various artificial intelligence and machine learning applications (e.g., image processing and natural language processing). As DNNs become deeper and enclose more filters per layer, they incur high computational costs and large memory consumption to preserve their large number of parameters. Moreover, present processing platforms (e.g., CPU, GPU, and FPGA) have not enough internal memory, and hence external memory storage is needed. Hence deploying DNNs on mobile applications is difficult, considering the limited storage space, computation power, energy supply, and real-time processing requirements. In this work, using a method based on tensor decomposition, network parameters were compressed, thereby reducing access to external memory. This compression method decomposes the network layers' weight tensor into a limited number of principal vectors such that (i) almost all the initial parameters can be retrieved, (ii) the network structure did not change, and (iii) the network quality after reproducing the parameters was almost similar to the original network in terms of detection accuracy. To optimize the realization of this method on FPGA, the tensor decomposition algorithm was modified while its convergence was not affected, and the reproduction of network parameters on FPGA was straightforward. The proposed algorithm reduced the parameters of ResNet50, VGG16, and VGG19 networks trained with Cifar10 and Cifar100 by almost 10 times.

摘要

深度神经网络(DNN)已广泛应用于各种人工智能和机器学习应用中(例如图像处理和自然语言处理)。随着 DNN 的加深和每层包含的滤波器增多,它们需要较高的计算成本和较大的内存消耗来保留大量的参数。此外,现有的处理平台(例如 CPU、GPU 和 FPGA)内部内存不足,因此需要外部内存存储。考虑到移动应用程序的存储空间有限、计算能力有限、能源供应有限和实时处理要求,因此在移动应用程序上部署 DNN 较为困难。在这项工作中,我们使用基于张量分解的方法对网络参数进行压缩,从而减少对外部内存的访问。这种压缩方法将网络层的权重张量分解为有限数量的主向量,使得(i)几乎可以检索到所有初始参数,(ii)网络结构不变,以及(iii)在参数复制后,网络质量在检测精度方面与原始网络几乎相似。为了在 FPGA 上优化该方法的实现,我们在不影响其收敛性的情况下对张量分解算法进行了修改,并且可以直接在 FPGA 上复制网络参数。所提出的算法将使用 Cifar10 和 Cifar100 训练的 ResNet50、VGG16 和 VGG19 网络的参数减少了近 10 倍。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验