• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种针对 LSTM/GRU 算法的基于定点的 FPGA/ASIC 硬件加速器设计的后训练量化方法。

A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.

机构信息

Department of Information Engineering, University of Pisa, Pisa 56122, Italy.

出版信息

Comput Intell Neurosci. 2022 May 11;2022:9485933. doi: 10.1155/2022/9485933. eCollection 2022.

DOI:10.1155/2022/9485933
PMID:35602644
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9117057/
Abstract

Recurrent Neural Networks (RNNs) have become important tools for tasks such as speech recognition, text generation, or natural language processing. However, their inference may involve up to billions of operations and their large number of parameters leads to large storage size and runtime memory usage. These reasons impede the adoption of these models in real-time, on-the-edge applications. Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) have emerged as promising solutions for the hardware acceleration of these algorithms, thanks to their degree of customization of compute data paths and memory subsystems, which makes them take the maximum advantage from compression techniques for what concerns area, timing, and power consumption. In contrast to the extensive study in compression and quantization for plain feed forward neural networks in the literature, little attention has been paid to reducing the computational resource requirements of RNNs. This work proposes a new effective methodology for the post-training quantization of RNNs. In particular, we focus on the quantization of Long Short-Term Memory (LSTM) RNNs and Gated Recurrent Unit (GRU) RNNs. The proposed quantization strategy is meant to be a detailed guideline toward the design of custom hardware accelerators for LSTM/GRU-based algorithms to be implemented on FPGA or ASIC devices using fixed-point arithmetic only. We applied our methods to LSTM/GRU models pretrained on the IMDb sentiment classification dataset and Penn TreeBank language modelling dataset, thus comparing each quantized model to its floating-point counterpart. The results show the possibility to achieve up to 90% memory footprint reduction in both cases, obtaining less than 1% loss in accuracy and even a slight improvement in the Perplexity per word metric, respectively. The results are presented showing the various trade-offs between memory footprint reduction and accuracy changes, demonstrating the benefits of the proposed methodology even in comparison with other works from the literature.

摘要

递归神经网络 (RNN) 已成为语音识别、文本生成或自然语言处理等任务的重要工具。然而,它们的推理可能涉及多达数十亿次的操作,并且它们的大量参数导致了大的存储大小和运行时内存使用。这些原因阻碍了这些模型在实时、边缘应用中的采用。现场可编程门阵列 (FPGA) 和专用集成电路 (ASIC) 已成为这些算法硬件加速的有前途的解决方案,这要归功于它们对计算数据路径和存储子系统的高度定制化,这使得它们能够最大限度地利用压缩技术来节省面积、时间和功耗。与文献中对纯前馈神经网络的压缩和量化的广泛研究相比,对 RNN 的计算资源需求减少的关注较少。这项工作提出了一种新的 RNN 后训练量化的有效方法。特别是,我们专注于长短期记忆 (LSTM) RNN 和门控循环单元 (GRU) RNN 的量化。所提出的量化策略旨在为基于 LSTM/GRU 的算法的定制硬件加速器的设计提供详细的指导,这些算法将仅使用定点算术在 FPGA 或 ASIC 设备上实现。我们将我们的方法应用于在 IMDb 情感分类数据集和 Penn TreeBank 语言建模数据集上预训练的 LSTM/GRU 模型,从而将每个量化模型与浮点对应模型进行比较。结果表明,在两种情况下,都有可能将内存占用减少 90%,同时准确率损失不到 1%,甚至在每个单词的困惑度指标上略有提高。结果显示了在内存占用减少和准确性变化之间的各种权衡,即使与文献中的其他工作相比,也证明了所提出的方法的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/6e58f7cefce9/CIN2022-9485933.010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/1e5b310a3bc1/CIN2022-9485933.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/3b0b7dfe74e9/CIN2022-9485933.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/512898f0a717/CIN2022-9485933.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/20419c8c148b/CIN2022-9485933.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/be6e441a5d58/CIN2022-9485933.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/0411603117b4/CIN2022-9485933.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/ffba715b3fca/CIN2022-9485933.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/1c52d5943a93/CIN2022-9485933.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/4c0543f07d73/CIN2022-9485933.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/6e58f7cefce9/CIN2022-9485933.010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/1e5b310a3bc1/CIN2022-9485933.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/3b0b7dfe74e9/CIN2022-9485933.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/512898f0a717/CIN2022-9485933.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/20419c8c148b/CIN2022-9485933.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/be6e441a5d58/CIN2022-9485933.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/0411603117b4/CIN2022-9485933.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/ffba715b3fca/CIN2022-9485933.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/1c52d5943a93/CIN2022-9485933.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/4c0543f07d73/CIN2022-9485933.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/6e58f7cefce9/CIN2022-9485933.010.jpg

相似文献

1
A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.一种针对 LSTM/GRU 算法的基于定点的 FPGA/ASIC 硬件加速器设计的后训练量化方法。
Comput Intell Neurosci. 2022 May 11;2022:9485933. doi: 10.1155/2022/9485933. eCollection 2022.
2
Quantization-Aware NN Layers with High-throughput FPGA Implementation for Edge AI.具有高吞吐量 FPGA 实现的量化感知神经网络层,用于边缘人工智能。
Sensors (Basel). 2023 May 11;23(10):4667. doi: 10.3390/s23104667.
3
Character gated recurrent neural networks for Arabic sentiment analysis.基于字符门控循环神经网络的阿拉伯语情感分析。
Sci Rep. 2022 Jun 13;12(1):9779. doi: 10.1038/s41598-022-13153-w.
4
FPGA-Based Hybrid-Type Implementation of Quantized Neural Networks for Remote Sensing Applications.基于 FPGA 的量化神经网络混合式实现及其在遥感中的应用。
Sensors (Basel). 2019 Feb 22;19(4):924. doi: 10.3390/s19040924.
5
Acceleration of Deep Neural Network Training Using Field Programmable Gate Arrays.使用现场可编程门阵列加速深度神经网络训练。
Comput Intell Neurosci. 2022 Oct 17;2022:8387364. doi: 10.1155/2022/8387364. eCollection 2022.
6
Retracted: A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.撤回:一种用于基于定点的LSTM/GRU算法的FPGA/ASIC硬件加速器设计的训练后量化方法。
Comput Intell Neurosci. 2023 Dec 13;2023:9865750. doi: 10.1155/2023/9865750. eCollection 2023.
7
Performance of recurrent neural networks with Monte Carlo dropout for predicting pharmacokinetic parameters from dynamic contrast-enhanced magnetic resonance imaging data.用于从动态对比增强磁共振成像数据预测药代动力学参数的蒙特卡洛随机失活循环神经网络的性能
J Appl Clin Med Phys. 2025 Feb;26(2):e14586. doi: 10.1002/acm2.14586. Epub 2024 Dec 23.
8
Hybrid Deep Recurrent Neural Networks for Noise Reduction of MEMS-IMU with Static and Dynamic Conditions.用于在静态和动态条件下降低MEMS-IMU噪声的混合深度循环神经网络
Micromachines (Basel). 2021 Feb 20;12(2):214. doi: 10.3390/mi12020214.
9
Quantization Friendly MobileNet (QF-MobileNet) Architecture for Vision Based Applications on Embedded Platforms.面向嵌入式平台视觉应用的量化友好型 MobileNet(QF-MobileNet)架构。
Neural Netw. 2021 Apr;136:28-39. doi: 10.1016/j.neunet.2020.12.022. Epub 2020 Dec 29.
10
A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation.一种面向硬件的 CNN 低比特数 2 的幂量化方法及其 FPGA 实现。
Sensors (Basel). 2022 Sep 1;22(17):6618. doi: 10.3390/s22176618.

引用本文的文献

1
Retracted: A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.撤回:一种用于基于定点的LSTM/GRU算法的FPGA/ASIC硬件加速器设计的训练后量化方法。
Comput Intell Neurosci. 2023 Dec 13;2023:9865750. doi: 10.1155/2023/9865750. eCollection 2023.

本文引用的文献

1
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.长期递归卷积网络的视觉识别与描述。
IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):677-691. doi: 10.1109/TPAMI.2016.2599174. Epub 2016 Sep 1.
2
LSTM: A Search Space Odyssey.长短期记忆网络:搜索空间奥德赛。
IEEE Trans Neural Netw Learn Syst. 2017 Oct;28(10):2222-2232. doi: 10.1109/TNNLS.2016.2582924. Epub 2016 Jul 8.