深思维：受限玻尔兹曼机人工神经网络的深度学习加速器。

DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1441-1453. doi: 10.1109/TNNLS.2017.2665555. Epub 2017 Mar 8.

DOI:10.1109/TNNLS.2017.2665555

Abstract

Although there have been many decades of research and commercial presence on high performance general purpose processors, there are still many applications that require fully customized hardware architectures for further computational acceleration. Recently, deep learning has been successfully used to learn in a wide variety of applications, but their heavy computation demand has considerably limited their practical applications. This paper proposes a fully pipelined acceleration architecture to alleviate high computational demand of an artificial neural network (ANN) which is restricted Boltzmann machine (RBM) ANNs. The implemented RBM ANN accelerator (integrating network size, using 128 input cases per batch, and running at a 303-MHz clock frequency) integrated in a state-of-the art field-programmable gate array (FPGA) (Xilinx Virtex 7 XC7V-2000T) provides a computational performance of 301-billion connection-updates-per-second and about 193 times higher performance than a software solution running on general purpose processors. Most importantly, the architecture enables over 4 times (12 times in batch learning) higher performance compared with a previous work when both are implemented in an FPGA device (XC2VP70).

摘要

尽管在高性能通用处理器上已经进行了数十年的研究和商业化应用，但仍有许多应用需要完全定制的硬件架构来进一步实现计算加速。最近，深度学习已经成功地应用于各种不同的领域，但它们对计算的大量需求严重限制了它们的实际应用。本文提出了一种全流水线加速架构，以减轻人工神经网络（ANN）的高计算需求，人工神经网络是受限玻尔兹曼机（RBM）的神经网络。在一个最先进的现场可编程门阵列（FPGA）（Xilinx Virtex 7 XC7V-2000T）中集成实现的 RBM ANN 加速器（集成网络大小，每个批次使用 128 个输入案例，运行在 303MHz 的时钟频率下）提供了 3010 亿次连接更新/秒的计算性能，比在通用处理器上运行的软件解决方案高出约 193 倍。最重要的是，与在相同 FPGA 设备（XC2VP70）中实现的上一篇工作相比，该架构在批量学习时的性能提高了 4 倍以上（提高了 12 倍）。

相似文献

DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.

IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1441-1453. doi: 10.1109/TNNLS.2017.2665555. Epub 2017 Mar 8.

High-performance reconfigurable hardware architecture for restricted Boltzmann machines.

IEEE Trans Neural Netw. 2010 Nov;21(11):1780-92. doi: 10.1109/TNN.2010.2073481. Epub 2010 Sep 20.

Performance analysis of multiple input single layer neural network hardware chip.

Multimed Tools Appl. 2023 Feb 20:1-22. doi: 10.1007/s11042-023-14627-3.

Runtime Programmable and Memory Bandwidth Optimized FPGA-Based Coprocessor for Deep Convolutional Neural Network.

IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5922-5934. doi: 10.1109/TNNLS.2018.2815085. Epub 2018 Apr 9.

Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip.

J Imaging. 2020 Aug 25;6(9):85. doi: 10.3390/jimaging6090085.

GeCo: Classification Restricted Boltzmann Machine Hardware for On-Chip Semisupervised Learning and Bayesian Inference.

IEEE Trans Neural Netw Learn Syst. 2020 Jan;31(1):53-65. doi: 10.1109/TNNLS.2019.2899386. Epub 2019 Mar 15.

A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System.

Sensors (Basel). 2017 Aug 23;17(9):1941. doi: 10.3390/s17091941.

Embedded Streaming Deep Neural Networks Accelerator With Applications.

IEEE Trans Neural Netw Learn Syst. 2017 Jul;28(7):1572-1583. doi: 10.1109/TNNLS.2016.2545298. Epub 2016 Apr 8.

Designing Deep Learning Hardware Accelerator and Efficiency Evaluation.

Comput Intell Neurosci. 2022 Jul 13;2022:1291103. doi: 10.1155/2022/1291103. eCollection 2022.

Resources and Power Efficient FPGA Accelerators for Real-Time Image Classification.

J Imaging. 2022 Apr 15;8(4):114. doi: 10.3390/jimaging8040114.

引用本文的文献

Autoencoder and restricted Boltzmann machine for transfer learning in functional magnetic resonance imaging task classification.

Heliyon. 2023 Jul 16;9(7):e18086. doi: 10.1016/j.heliyon.2023.e18086. eCollection 2023 Jul.

Correlation Analysis Between Japanese Literature and Psychotherapy Based on Diagnostic Equation Algorithm.

Front Psychol. 2022 May 30;13:906952. doi: 10.3389/fpsyg.2022.906952. eCollection 2022.

Improved Artificial Neural Network with State Order Dataset Estimation for Brain Cancer Cell Diagnosis.

Biomed Res Int. 2022 Apr 16;2022:7799812. doi: 10.1155/2022/7799812. eCollection 2022.

Steganography-based voice hiding in medical images of COVID-19 patients.

Nonlinear Dyn. 2021;105(3):2677-2692. doi: 10.1007/s11071-021-06700-z. Epub 2021 Jul 22.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

深思维：受限玻尔兹曼机人工神经网络的深度学习加速器。

DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献