• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Ps和Qs:用于高效低延迟神经网络推理的量化感知剪枝

Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference.

作者信息

Hawks Benjamin, Duarte Javier, Fraser Nicholas J, Pappalardo Alessandro, Tran Nhan, Umuroglu Yaman

机构信息

Fermi National Accelerator Laboratory, Batavia, IL, United States.

University of California San Diego, La Jolla, CA, United States.

出版信息

Front Artif Intell. 2021 Jul 9;4:676564. doi: 10.3389/frai.2021.676564. eCollection 2021.

DOI:10.3389/frai.2021.676564
PMID:34308339
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8299073/
Abstract

Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular techniques for reducing computation in neural networks are pruning, removing insignificant synapses, and quantization, reducing the precision of the calculations. In this work, we explore the interplay between pruning and quantization during the training of neural networks for ultra low latency applications targeting high energy physics use cases. Techniques developed for this study have potential applications across many other domains. We study various configurations of pruning during quantization-aware training, which we term , and the effect of techniques like regularization, batch normalization, and different pruning schemes on performance, computational complexity, and information content metrics. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task. Further, quantization-aware pruning typically performs similar to or better in terms of computational efficiency compared to other neural architecture search techniques like Bayesian optimization. Surprisingly, while networks with different training configurations can have similar performance for the benchmark application, the information content in the network can vary significantly, affecting its generalizability.

摘要

针对硬件推理进行优化的高效机器学习实现具有广泛的益处,这取决于应用场景,从更低的推理延迟到更高的数据吞吐量以及更低的能耗。神经网络中两种流行的减少计算量的技术是剪枝(去除不重要的突触)和量化(降低计算精度)。在这项工作中,我们针对面向高能物理用例的超低延迟应用,探索神经网络训练过程中剪枝和量化之间的相互作用。为该研究开发的技术在许多其他领域都有潜在应用。我们研究了量化感知训练期间的各种剪枝配置(我们称之为 ),以及正则化、批量归一化和不同剪枝方案等技术对性能、计算复杂度和信息内容指标的影响。我们发现,对于我们的任务,量化感知剪枝产生的模型比单独的剪枝或量化在计算上更高效。此外,与贝叶斯优化等其他神经架构搜索技术相比,量化感知剪枝在计算效率方面通常表现相当或更好。令人惊讶的是,虽然具有不同训练配置的网络在基准应用中可能具有相似的性能,但网络中的信息内容可能会有显著差异,从而影响其泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/b49a9b189b1c/frai-04-676564-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/6f99bfbe5913/frai-04-676564-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/1649cacf60e3/frai-04-676564-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/5ce853bb002d/frai-04-676564-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/710681b1a6b2/frai-04-676564-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/1c8cd608e4ca/frai-04-676564-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/5ed8829b8506/frai-04-676564-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/b49a9b189b1c/frai-04-676564-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/6f99bfbe5913/frai-04-676564-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/1649cacf60e3/frai-04-676564-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/5ce853bb002d/frai-04-676564-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/710681b1a6b2/frai-04-676564-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/1c8cd608e4ca/frai-04-676564-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/5ed8829b8506/frai-04-676564-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f2f/8299073/b49a9b189b1c/frai-04-676564-g007.jpg

相似文献

1
Ps and Qs: Quantization-Aware Pruning for Efficient Low Latency Neural Network Inference.Ps和Qs:用于高效低延迟神经网络推理的量化感知剪枝
Front Artif Intell. 2021 Jul 9;4:676564. doi: 10.3389/frai.2021.676564. eCollection 2021.
2
Multi-objective evolutionary optimization for hardware-aware neural network pruning.用于硬件感知神经网络剪枝的多目标进化优化
Fundam Res. 2022 Aug 9;4(4):941-950. doi: 10.1016/j.fmre.2022.07.013. eCollection 2024 Jul.
3
Quantization Friendly MobileNet (QF-MobileNet) Architecture for Vision Based Applications on Embedded Platforms.面向嵌入式平台视觉应用的量化友好型 MobileNet(QF-MobileNet)架构。
Neural Netw. 2021 Apr;136:28-39. doi: 10.1016/j.neunet.2020.12.022. Epub 2020 Dec 29.
4
Pruning and quantization algorithm with applications in memristor-based convolutional neural network.基于忆阻器的卷积神经网络中的剪枝与量化算法及其应用
Cogn Neurodyn. 2024 Feb;18(1):233-245. doi: 10.1007/s11571-022-09927-7. Epub 2023 Jan 19.
5
Single-Path Bit Sharing for Automatic Loss-Aware Model Compression.用于自动损失感知模型压缩的单路径位共享
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12459-12473. doi: 10.1109/TPAMI.2023.3275159. Epub 2023 Sep 5.
6
Efficient Resource-Aware Convolutional Neural Architecture Search for Edge Computing with Pareto-Bayesian Optimization.基于 Pareto-Bayesian 优化的高效资源感知卷积神经网络架构搜索在边缘计算中的应用。
Sensors (Basel). 2021 Jan 10;21(2):444. doi: 10.3390/s21020444.
7
Non-Structured DNN Weight Pruning-Is It Beneficial in Any Platform?非结构化深度神经网络权重剪枝——在任何平台上都有益吗?
IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4930-4944. doi: 10.1109/TNNLS.2021.3063265. Epub 2022 Aug 31.
8
Deep Neural Network Compression by In-Parallel Pruning-Quantization.通过并行剪枝-量化实现深度神经网络压缩。
IEEE Trans Pattern Anal Mach Intell. 2020 Mar;42(3):568-579. doi: 10.1109/TPAMI.2018.2886192. Epub 2018 Dec 12.
9
Training high-performance and large-scale deep neural networks with full 8-bit integers.用全 8 位整数训练高性能和大规模深度神经网络。
Neural Netw. 2020 May;125:70-82. doi: 10.1016/j.neunet.2019.12.027. Epub 2020 Jan 15.
10
A Soft-Pruning Method Applied During Training of Spiking Neural Networks for In-memory Computing Applications.一种在用于内存计算应用的脉冲神经网络训练期间应用的软剪枝方法。
Front Neurosci. 2019 Apr 26;13:405. doi: 10.3389/fnins.2019.00405. eCollection 2019.

引用本文的文献

1
Real-Time Inference With 2D Convolutional Neural Networks on Field Programmable Gate Arrays for High-Rate Particle Imaging Detectors.基于现场可编程门阵列的二维卷积神经网络对高速粒子成像探测器的实时推理
Front Artif Intell. 2022 May 18;5:855184. doi: 10.3389/frai.2022.855184. eCollection 2022.
2
Experimental implementation of a neural network optical channel equalizer in restricted hardware using pruning and quantization.使用剪枝和量化技术在受限硬件中实现神经网络光通道均衡器的实验。
Sci Rep. 2022 May 24;12(1):8713. doi: 10.1038/s41598-022-12563-0.
3
Applications and Techniques for Fast Machine Learning in Science.
科学领域中快速机器学习的应用与技术
Front Big Data. 2022 Apr 12;5:787421. doi: 10.3389/fdata.2022.787421. eCollection 2022.
4
Graph Neural Networks for Charged Particle Tracking on FPGAs.用于FPGA上带电粒子跟踪的图神经网络
Front Big Data. 2022 Mar 23;5:828666. doi: 10.3389/fdata.2022.828666. eCollection 2022.