通过逐层细化剪枝和在 FPGA 上的加速来优化深度神经网络。

Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA.

机构信息

Department of Electronic and Computer Engineering, Ritsumeikan University, Kusatsu, Shiga, Japan.

School of AI and Computer Science, Jiangnan University, Wuxi, China.

出版信息

Comput Intell Neurosci. 2022 Jun 1;2022:8039281. doi: 10.1155/2022/8039281. eCollection 2022.

DOI:10.1155/2022/8039281

PMID:35694575

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9177312/

Abstract

To accelerate the practical applications of artificial intelligence, this paper proposes a high efficient layer-wise refined pruning method for deep neural networks at the software level and accelerates the inference process at the hardware level on a field-programmable gate array (FPGA). The refined pruning operation is based on the channel-wise importance indexes of each layer and the layer-wise input sparsity of convolutional layers. The method utilizes the characteristics of the native networks without introducing any extra workloads to the training phase. In addition, the operation is easy to be extended to various state-of-the-art deep neural networks. The effectiveness of the method is verified on ResNet architecture and VGG networks in terms of dataset CIFAR10, CIFAR100, and ImageNet100. Experimental results show that in terms of ResNet50 on CIFAR10 and ResNet101 on CIFAR100, more than 85% of parameters and Floating-Point Operations are pruned with only 0.35% and 0.40% accuracy loss, respectively. As for the VGG network, 87.05% of parameters and 75.78% of Floating-Point Operations are pruned with only 0.74% accuracy loss for VGG13BN on CIFAR10. Furthermore, we accelerate the networks at the hardware level on the FPGA platform by utilizing the tool Vitis AI. For two threads mode in FPGA, the throughput/fps of the pruned VGG13BN and ResNet101 achieves 151.99 fps and 124.31 fps, respectively, and the pruned networks achieve about 4.3× and 1.8× speed up for VGG13BN and ResNet101, respectively, compared with the original networks on FPGA.

摘要

为了加速人工智能的实际应用，本文提出了一种高效的软件层逐层细化剪枝方法，用于在现场可编程门阵列（FPGA）上对深度神经网络进行硬件级加速推理。细化剪枝操作基于各层的通道重要性指标和卷积层的层间输入稀疏性。该方法利用了原始网络的特点，而不会给训练阶段带来任何额外的工作负载。此外，该操作易于扩展到各种最先进的深度神经网络。该方法在 ResNet 架构和 VGG 网络上，通过数据集 CIFAR10、CIFAR100 和 ImageNet100 进行了验证。实验结果表明，在 CIFAR10 上的 ResNet50 和 CIFAR100 上的 ResNet101 中，参数和浮点运算分别超过 85%和 75.78%被修剪，精度损失仅为 0.35%和 0.40%。对于 VGG 网络，在 CIFAR10 上的 VGG13BN 中，参数和浮点运算分别修剪了 87.05%和 75.78%，精度损失仅为 0.74%。此外，我们通过利用 Vitis AI 工具在 FPGA 平台上对网络进行硬件级加速。对于 FPGA 中的双线程模式，修剪后的 VGG13BN 和 ResNet101 的吞吐量/fps 分别达到 151.99 fps 和 124.31 fps，与 FPGA 上的原始网络相比，修剪后的网络分别实现了约 4.3×和 1.8×的加速。

相似文献

Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA.

Comput Intell Neurosci. 2022 Jun 1;2022:8039281. doi: 10.1155/2022/8039281. eCollection 2022.

Efficient Layer-Wise : Sparse CNN Accelerator with Flexible SPEC: Sparse Processing Element Clusters.

Micromachines (Basel). 2023 Feb 24;14(3):528. doi: 10.3390/mi14030528.

Weak sub-network pruning for strong and efficient neural networks.

Neural Netw. 2021 Dec;144:614-626. doi: 10.1016/j.neunet.2021.09.015. Epub 2021 Sep 30.

Redundancy-Aware Pruning of Convolutional Neural Networks.

Neural Comput. 2020 Dec;32(12):2532-2556. doi: 10.1162/neco_a_01330. Epub 2020 Oct 20.

Acceleration of Deep Neural Network Training Using Field Programmable Gate Arrays.

Comput Intell Neurosci. 2022 Oct 17;2022:8387364. doi: 10.1155/2022/8387364. eCollection 2022.

A Novel Automate Python Edge-to-Edge: From Automated Generation on Cloud to User Application Deployment on Edge of Deep Neural Networks for Low Power IoT Systems FPGA-Based Acceleration.

Sensors (Basel). 2021 Sep 9;21(18):6050. doi: 10.3390/s21186050.

Compression of Deep Neural Networks based on quantized tensor decomposition to implement on reconfigurable hardware platforms.

Neural Netw. 2022 Jun;150:350-363. doi: 10.1016/j.neunet.2022.02.024. Epub 2022 Mar 8.

HRel: Filter pruning based on High Relevance between activation maps and class labels.

Neural Netw. 2022 Mar;147:186-197. doi: 10.1016/j.neunet.2021.12.017. Epub 2021 Dec 30.

Hardware Trojan Attacks on the Reconfigurable Interconnections of Field-Programmable Gate Array-Based Convolutional Neural Network Accelerators and a Physically Unclonable Function-Based Countermeasure Detection Technique.

Micromachines (Basel). 2024 Jan 19;15(1):149. doi: 10.3390/mi15010149.

An FSCV Deep Neural Network: Development, Pruning, and Acceleration on an FPGA.

IEEE J Biomed Health Inform. 2021 Jun;25(6):2248-2259. doi: 10.1109/JBHI.2020.3037366. Epub 2021 Jun 3.

引用本文的文献

Improved yolov5 algorithm combined with depth camera and embedded system for blind indoor visual assistance.

Sci Rep. 2024 Oct 3;14(1):23000. doi: 10.1038/s41598-024-74416-2.

Optimization of U-shaped pure transformer medical image segmentation network.

PeerJ Comput Sci. 2023 Aug 18;9:e1515. doi: 10.7717/peerj-cs.1515. eCollection 2023.

Enhanced mechanisms of pooling and channel attention for deep learning feature maps.

PeerJ Comput Sci. 2022 Nov 21;8:e1161. doi: 10.7717/peerj-cs.1161. eCollection 2022.

本文引用的文献

Dynamical Channel Pruning by Conditional Accuracy Change for Deep Neural Networks.

IEEE Trans Neural Netw Learn Syst. 2021 Feb;32(2):799-813. doi: 10.1109/TNNLS.2020.2979517. Epub 2021 Feb 4.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过逐层细化剪枝和在 FPGA 上的加速来优化深度神经网络。

Optimizing the Deep Neural Networks by Layer-Wise Refined Pruning and the Acceleration on FPGA.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献