• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

非结构化深度神经网络权重剪枝——在任何平台上都有益吗?

Non-Structured DNN Weight Pruning-Is It Beneficial in Any Platform?

作者信息

Ma Xiaolong, Lin Sheng, Ye Shaokai, He Zhezhi, Zhang Linfeng, Yuan Geng, Tan Sia Huat, Li Zhengang, Fan Deliang, Qian Xuehai, Lin Xue, Ma Kaisheng, Wang Yanzhi

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4930-4944. doi: 10.1109/TNNLS.2021.3063265. Epub 2022 Aug 31.

DOI:10.1109/TNNLS.2021.3063265
PMID:33735086
Abstract

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has higher flexibility and pruning rate but incurs index accesses due to irregular weights, or structured manner, which preserves the full matrix structure with a lower pruning rate. Weight quantization leverages the redundancy in the number of bits in weights. Compared to pruning, quantization is much more hardware-friendly and has become a "must-do" step for FPGA and ASIC implementations. Thus, any evaluation of the effectiveness of pruning should be on top of quantization. The key open question is, with quantization, what kind of pruning (non-structured versus structured) is most beneficial? This question is fundamental because the answer will determine the design aspects that we should really focus on to avoid the diminishing return of certain optimizations. This article provides a definitive answer to the question for the first time. First, we build ADMM-NN-S by extending and enhancing ADMM-NN, a recently proposed joint weight pruning and quantization framework, with the algorithmic supports for structured pruning, dynamic ADMM regulation, and masked mapping and retraining. Second, we develop a methodology for fair and fundamental comparison of non-structured and structured pruning in terms of both storage and computation efficiency. Our results show that ADMM-NN-S consistently outperforms the prior art: 1) it achieves 348× , 36× , and 8× overall weight pruning on LeNet-5, AlexNet, and ResNet-50, respectively, with (almost) zero accuracy loss and 2) we demonstrate the first fully binarized (for all layers) DNNs can be lossless in accuracy in many cases. These results provide a strong baseline and credibility of our study. Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structured pruning is not competitive in terms of both storage and computation efficiency. Thus, we conclude that structured pruning has a greater potential compared to non-structured pruning. We encourage the community to focus on studying the DNN inference acceleration with structured sparsity.

摘要

大型深度神经网络(DNN)模型对能源效率构成了关键挑战,因为片外DRAM访问的能耗显著高于算术运算或SRAM操作。这激发了对模型压缩的深入研究,主要有两种方法。权重剪枝利用了权重数量上的冗余,可以以非结构化方式进行,这种方式具有更高的灵活性和剪枝率,但由于权重不规则会导致索引访问;也可以以结构化方式进行,这种方式以较低的剪枝率保留完整的矩阵结构。权重量化利用了权重中比特数量上的冗余。与剪枝相比,量化对硬件更友好,已成为FPGA和ASIC实现的“必做”步骤。因此,对剪枝有效性的任何评估都应在量化的基础上进行。关键的开放性问题是,在量化的情况下,哪种剪枝(非结构化与结构化)最有益?这个问题至关重要,因为答案将决定我们真正应该关注的设计方面,以避免某些优化的收益递减。本文首次为该问题提供了明确答案。首先,我们通过扩展和增强ADMM-NN构建了ADMM-NN-S,ADMM-NN是最近提出的联合权重剪枝和量化框架,具备结构化剪枝、动态ADMM调节以及掩码映射和重新训练的算法支持。其次,我们开发了一种方法,用于在存储和计算效率方面对非结构化和结构化剪枝进行公平且基础的比较。我们的结果表明,ADMM-NN-S始终优于现有技术:1)它在LeNet-5、AlexNet和ResNet-50上分别实现了348倍、36倍和8倍的总体权重剪枝,且(几乎)零精度损失;2)我们证明了首个全二值化(所有层)的DNN在许多情况下可以无损精度。这些结果为我们的研究提供了强大的基线和可信度。基于所提出的比较框架,在相同精度和量化条件下,结果表明非结构化剪枝在存储和计算效率方面没有竞争力。因此,我们得出结论,与非结构化剪枝相比,结构化剪枝具有更大的潜力。我们鼓励社区专注于研究具有结构化稀疏性的DNN推理加速。

相似文献

1
Non-Structured DNN Weight Pruning-Is It Beneficial in Any Platform?非结构化深度神经网络权重剪枝——在任何平台上都有益吗?
IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4930-4944. doi: 10.1109/TNNLS.2021.3063265. Epub 2022 Aug 31.
2
StructADMM: Achieving Ultrahigh Efficiency in Structured Pruning for DNNs.结构化交替方向乘子法(StructADMM):在深度神经网络的结构化剪枝中实现超高效率
IEEE Trans Neural Netw Learn Syst. 2022 May;33(5):2259-2273. doi: 10.1109/TNNLS.2020.3045153. Epub 2022 May 2.
3
Reweighted Alternating Direction Method of Multipliers for DNN weight pruning.基于重加权交替方向乘子法的 DNN 权值剪枝。
Neural Netw. 2024 Nov;179:106534. doi: 10.1016/j.neunet.2024.106534. Epub 2024 Jul 14.
4
A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation.一种硬件友好的高精度 CNN 剪枝方法及其 FPGA 实现。
Sensors (Basel). 2023 Jan 11;23(2):824. doi: 10.3390/s23020824.
5
GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices Based on Fine-Grained Structured Weight Sparsity.GRIM:一种基于细粒度结构化权重稀疏化的用于移动设备的通用、实时深度学习推理框架。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6224-6239. doi: 10.1109/TPAMI.2021.3089687. Epub 2022 Sep 14.
6
Jump-GRS: a multi-phase approach to structured pruning of neural networks for neural decoding.Jump-GRS:一种用于神经解码的神经网络结构化剪枝的多阶段方法。
J Neural Eng. 2023 Jul 31;20(4). doi: 10.1088/1741-2552/ace5dc.
7
Resource-constrained FPGA/DNN co-design.资源受限的FPGA/DNN协同设计
Neural Comput Appl. 2021;33(21):14741-14751. doi: 10.1007/s00521-021-06113-4. Epub 2021 May 15.
8
Toward Compact ConvNets via Structure-Sparsity Regularized Filter Pruning.通过结构稀疏正则化滤波器剪枝实现紧凑卷积神经网络
IEEE Trans Neural Netw Learn Syst. 2020 Feb;31(2):574-588. doi: 10.1109/TNNLS.2019.2906563. Epub 2019 Apr 12.
9
A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation.一种面向硬件的 CNN 低比特数 2 的幂量化方法及其 FPGA 实现。
Sensors (Basel). 2022 Sep 1;22(17):6618. doi: 10.3390/s22176618.
10
Dynamic Probabilistic Pruning: A General Framework for Hardware-Constrained Pruning at Different Granularities.动态概率剪枝:一种用于不同粒度硬件约束剪枝的通用框架。
IEEE Trans Neural Netw Learn Syst. 2022 Jun 8;PP. doi: 10.1109/TNNLS.2022.3176809.

引用本文的文献

1
Comparative analysis of model compression techniques for achieving carbon efficient AI.实现碳高效人工智能的模型压缩技术比较分析
Sci Rep. 2025 Jul 2;15(1):23461. doi: 10.1038/s41598-025-07821-w.
2
Reconstructed SqueezeNext with C-CBAM for offline handwritten Chinese character recognition.用于离线手写汉字识别的带C-CBAM的重构SqueezeNext
PeerJ Comput Sci. 2023 Aug 14;9:e1529. doi: 10.7717/peerj-cs.1529. eCollection 2023.
3
Rethinking Weight Decay for Efficient Neural Network Pruning.为高效神经网络剪枝重新思考权重衰减
J Imaging. 2022 Mar 4;8(3):64. doi: 10.3390/jimaging8030064.