基于张量分解量化的深度神经网络压缩及其在可重构硬件平台上的实现。

Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platforms.

机构信息

University of Tehran, Iran.

出版信息

Neural Netw. 2022 Jun;150:350-363. doi: 10.1016/j.neunet.2022.02.024. Epub 2022 Mar 8.

DOI:10.1016/j.neunet.2022.02.024

Abstract

Deep Neural Networks (DNNs) have been vastly and successfully employed in various artificial intelligence and machine learning applications (e.g., image processing and natural language processing). As DNNs become deeper and enclose more filters per layer, they incur high computational costs and large memory consumption to preserve their large number of parameters. Moreover, present processing platforms (e.g., CPU, GPU, and FPGA) have not enough internal memory, and hence external memory storage is needed. Hence deploying DNNs on mobile applications is difficult, considering the limited storage space, computation power, energy supply, and real-time processing requirements. In this work, using a method based on tensor decomposition, network parameters were compressed, thereby reducing access to external memory. This compression method decomposes the network layers' weight tensor into a limited number of principal vectors such that (i) almost all the initial parameters can be retrieved, (ii) the network structure did not change, and (iii) the network quality after reproducing the parameters was almost similar to the original network in terms of detection accuracy. To optimize the realization of this method on FPGA, the tensor decomposition algorithm was modified while its convergence was not affected, and the reproduction of network parameters on FPGA was straightforward. The proposed algorithm reduced the parameters of ResNet50, VGG16, and VGG19 networks trained with Cifar10 and Cifar100 by almost 10 times.

摘要

深度神经网络（DNN）已广泛应用于各种人工智能和机器学习应用中（例如图像处理和自然语言处理）。随着 DNN 的加深和每层包含的滤波器增多，它们需要较高的计算成本和较大的内存消耗来保留大量的参数。此外，现有的处理平台（例如 CPU、GPU 和 FPGA）内部内存不足，因此需要外部内存存储。考虑到移动应用程序的存储空间有限、计算能力有限、能源供应有限和实时处理要求，因此在移动应用程序上部署 DNN 较为困难。在这项工作中，我们使用基于张量分解的方法对网络参数进行压缩，从而减少对外部内存的访问。这种压缩方法将网络层的权重张量分解为有限数量的主向量，使得（i）几乎可以检索到所有初始参数，（ii）网络结构不变，以及（iii）在参数复制后，网络质量在检测精度方面与原始网络几乎相似。为了在 FPGA 上优化该方法的实现，我们在不影响其收敛性的情况下对张量分解算法进行了修改，并且可以直接在 FPGA 上复制网络参数。所提出的算法将使用 Cifar10 和 Cifar100 训练的 ResNet50、VGG16 和 VGG19 网络的参数减少了近 10 倍。

相似文献

Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platforms.

Neural Netw. 2022 Jun;150:350-363. doi: 10.1016/j.neunet.2022.02.024. Epub 2022 Mar 8.

Hybrid tensor decomposition in neural network compression.

Neural Netw. 2020 Dec;132:309-320. doi: 10.1016/j.neunet.2020.09.006. Epub 2020 Sep 19.

Nonlinear tensor train format for deep neural network compression.

Neural Netw. 2021 Dec;144:320-333. doi: 10.1016/j.neunet.2021.08.028. Epub 2021 Sep 8.

Block-term tensor neural networks.

Neural Netw. 2020 Oct;130:11-21. doi: 10.1016/j.neunet.2020.05.034. Epub 2020 Jun 7.

Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA.

Comput Intell Neurosci. 2022 Jun 1;2022:8039281. doi: 10.1155/2022/8039281. eCollection 2022.

Compressing Deep Networks by Neuron Agglomerative Clustering.

Sensors (Basel). 2020 Oct 23;20(21):6033. doi: 10.3390/s20216033.

A Novel Memory-Scheduling Strategy for Large Convolutional Neural Network on Memory-Limited Devices.

Comput Intell Neurosci. 2019 Apr 28;2019:4328653. doi: 10.1155/2019/4328653. eCollection 2019.

Runtime Programmable and Memory Bandwidth Optimized FPGA-Based Coprocessor for Deep Convolutional Neural Network.

IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5922-5934. doi: 10.1109/TNNLS.2018.2815085. Epub 2018 Apr 9.

Accelerating DNN Training Through Selective Localized Learning.

Front Neurosci. 2022 Jan 11;15:759807. doi: 10.3389/fnins.2021.759807. eCollection 2021.

Neural Network Compression Based on Tensor Ring Decomposition.

IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5388-5402. doi: 10.1109/TNNLS.2024.3383392. Epub 2025 Feb 28.

引用本文的文献

Multi-contrast learning-guided lightweight few-shot learning scheme for predicting breast cancer molecular subtypes.

Med Biol Eng Comput. 2024 May;62(5):1601-1613. doi: 10.1007/s11517-024-03031-0. Epub 2024 Feb 6.

Enhancement of Deep Neural Network Recognition on MPSoC with Single Event Upset.

Micromachines (Basel). 2023 Dec 7;14(12):2215. doi: 10.3390/mi14122215.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于张量分解量化的深度神经网络压缩及其在可重构硬件平台上的实现。

Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platforms.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献