一种针对 LSTM/GRU 算法的基于定点的 FPGA/ASIC 硬件加速器设计的后训练量化方法。

A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.

机构信息

Department of Information Engineering, University of Pisa, Pisa 56122, Italy.

出版信息

Comput Intell Neurosci. 2022 May 11;2022:9485933. doi: 10.1155/2022/9485933. eCollection 2022.

DOI:10.1155/2022/9485933

PMID:35602644

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9117057/

Abstract

Recurrent Neural Networks (RNNs) have become important tools for tasks such as speech recognition, text generation, or natural language processing. However, their inference may involve up to billions of operations and their large number of parameters leads to large storage size and runtime memory usage. These reasons impede the adoption of these models in real-time, on-the-edge applications. Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) have emerged as promising solutions for the hardware acceleration of these algorithms, thanks to their degree of customization of compute data paths and memory subsystems, which makes them take the maximum advantage from compression techniques for what concerns area, timing, and power consumption. In contrast to the extensive study in compression and quantization for plain feed forward neural networks in the literature, little attention has been paid to reducing the computational resource requirements of RNNs. This work proposes a new effective methodology for the post-training quantization of RNNs. In particular, we focus on the quantization of Long Short-Term Memory (LSTM) RNNs and Gated Recurrent Unit (GRU) RNNs. The proposed quantization strategy is meant to be a detailed guideline toward the design of custom hardware accelerators for LSTM/GRU-based algorithms to be implemented on FPGA or ASIC devices using fixed-point arithmetic only. We applied our methods to LSTM/GRU models pretrained on the IMDb sentiment classification dataset and Penn TreeBank language modelling dataset, thus comparing each quantized model to its floating-point counterpart. The results show the possibility to achieve up to 90% memory footprint reduction in both cases, obtaining less than 1% loss in accuracy and even a slight improvement in the Perplexity per word metric, respectively. The results are presented showing the various trade-offs between memory footprint reduction and accuracy changes, demonstrating the benefits of the proposed methodology even in comparison with other works from the literature.

摘要

递归神经网络 (RNN) 已成为语音识别、文本生成或自然语言处理等任务的重要工具。然而，它们的推理可能涉及多达数十亿次的操作，并且它们的大量参数导致了大的存储大小和运行时内存使用。这些原因阻碍了这些模型在实时、边缘应用中的采用。现场可编程门阵列 (FPGA) 和专用集成电路 (ASIC) 已成为这些算法硬件加速的有前途的解决方案，这要归功于它们对计算数据路径和存储子系统的高度定制化，这使得它们能够最大限度地利用压缩技术来节省面积、时间和功耗。与文献中对纯前馈神经网络的压缩和量化的广泛研究相比，对 RNN 的计算资源需求减少的关注较少。这项工作提出了一种新的 RNN 后训练量化的有效方法。特别是，我们专注于长短期记忆 (LSTM) RNN 和门控循环单元 (GRU) RNN 的量化。所提出的量化策略旨在为基于 LSTM/GRU 的算法的定制硬件加速器的设计提供详细的指导，这些算法将仅使用定点算术在 FPGA 或 ASIC 设备上实现。我们将我们的方法应用于在 IMDb 情感分类数据集和 Penn TreeBank 语言建模数据集上预训练的 LSTM/GRU 模型，从而将每个量化模型与浮点对应模型进行比较。结果表明，在两种情况下，都有可能将内存占用减少 90%，同时准确率损失不到 1%，甚至在每个单词的困惑度指标上略有提高。结果显示了在内存占用减少和准确性变化之间的各种权衡，即使与文献中的其他工作相比，也证明了所提出的方法的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/feac/9117057/1e5b310a3bc1/CIN2022-9485933.001.jpg

相似文献

A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.

Comput Intell Neurosci. 2022 May 11;2022:9485933. doi: 10.1155/2022/9485933. eCollection 2022.

Quantization-Aware NN Layers with High-throughput FPGA Implementation for Edge AI.

Sensors (Basel). 2023 May 11;23(10):4667. doi: 10.3390/s23104667.

Character gated recurrent neural networks for Arabic sentiment analysis.

Sci Rep. 2022 Jun 13;12(1):9779. doi: 10.1038/s41598-022-13153-w.

FPGA-Based Hybrid-Type Implementation of Quantized Neural Networks for Remote Sensing Applications.

Sensors (Basel). 2019 Feb 22;19(4):924. doi: 10.3390/s19040924.

Acceleration of Deep Neural Network Training Using Field Programmable Gate Arrays.

Comput Intell Neurosci. 2022 Oct 17;2022:8387364. doi: 10.1155/2022/8387364. eCollection 2022.

Retracted: A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.

Comput Intell Neurosci. 2023 Dec 13;2023:9865750. doi: 10.1155/2023/9865750. eCollection 2023.

Performance of recurrent neural networks with Monte Carlo dropout for predicting pharmacokinetic parameters from dynamic contrast-enhanced magnetic resonance imaging data.

J Appl Clin Med Phys. 2025 Feb;26(2):e14586. doi: 10.1002/acm2.14586. Epub 2024 Dec 23.

Hybrid Deep Recurrent Neural Networks for Noise Reduction of MEMS-IMU with Static and Dynamic Conditions.

Micromachines (Basel). 2021 Feb 20;12(2):214. doi: 10.3390/mi12020214.

Quantization Friendly MobileNet (QF-MobileNet) Architecture for Vision Based Applications on Embedded Platforms.

Neural Netw. 2021 Apr;136:28-39. doi: 10.1016/j.neunet.2020.12.022. Epub 2020 Dec 29.

A Hardware-Friendly Low-Bit Power-of-Two Quantization Method for CNNs and Its FPGA Implementation.

Sensors (Basel). 2022 Sep 1;22(17):6618. doi: 10.3390/s22176618.

引用本文的文献

Retracted: A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.

Comput Intell Neurosci. 2023 Dec 13;2023:9865750. doi: 10.1155/2023/9865750. eCollection 2023.

本文引用的文献

Long-Term Recurrent Convolutional Networks for Visual Recognition and Description.

IEEE Trans Pattern Anal Mach Intell. 2017 Apr;39(4):677-691. doi: 10.1109/TPAMI.2016.2599174. Epub 2016 Sep 1.

LSTM: A Search Space Odyssey.

IEEE Trans Neural Netw Learn Syst. 2017 Oct;28(10):2222-2232. doi: 10.1109/TNNLS.2016.2582924. Epub 2016 Jul 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种针对 LSTM/GRU 算法的基于定点的 FPGA/ASIC 硬件加速器设计的后训练量化方法。

A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.

机构信息

Department of Information Engineering, University of Pisa, Pisa 56122, Italy.

出版信息

Comput Intell Neurosci. 2022 May 11;2022:9485933. doi: 10.1155/2022/9485933. eCollection 2022.

DOI:10.1155/2022/9485933

PMID:35602644

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9117057/

Abstract

摘要

一种针对 LSTM/GRU 算法的基于定点的 FPGA/ASIC 硬件加速器设计的后训练量化方法。

A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

一种针对 LSTM/GRU 算法的基于定点的 FPGA/ASIC 硬件加速器设计的后训练量化方法。

A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献