• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深思维:受限玻尔兹曼机人工神经网络的深度学习加速器。

DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1441-1453. doi: 10.1109/TNNLS.2017.2665555. Epub 2017 Mar 8.

DOI:10.1109/TNNLS.2017.2665555
PMID:28287986
Abstract

Although there have been many decades of research and commercial presence on high performance general purpose processors, there are still many applications that require fully customized hardware architectures for further computational acceleration. Recently, deep learning has been successfully used to learn in a wide variety of applications, but their heavy computation demand has considerably limited their practical applications. This paper proposes a fully pipelined acceleration architecture to alleviate high computational demand of an artificial neural network (ANN) which is restricted Boltzmann machine (RBM) ANNs. The implemented RBM ANN accelerator (integrating network size, using 128 input cases per batch, and running at a 303-MHz clock frequency) integrated in a state-of-the art field-programmable gate array (FPGA) (Xilinx Virtex 7 XC7V-2000T) provides a computational performance of 301-billion connection-updates-per-second and about 193 times higher performance than a software solution running on general purpose processors. Most importantly, the architecture enables over 4 times (12 times in batch learning) higher performance compared with a previous work when both are implemented in an FPGA device (XC2VP70).

摘要

尽管在高性能通用处理器上已经进行了数十年的研究和商业化应用,但仍有许多应用需要完全定制的硬件架构来进一步实现计算加速。最近,深度学习已经成功地应用于各种不同的领域,但它们对计算的大量需求严重限制了它们的实际应用。本文提出了一种全流水线加速架构,以减轻人工神经网络(ANN)的高计算需求,人工神经网络是受限玻尔兹曼机(RBM)的神经网络。在一个最先进的现场可编程门阵列(FPGA)(Xilinx Virtex 7 XC7V-2000T)中集成实现的 RBM ANN 加速器(集成网络大小,每个批次使用 128 个输入案例,运行在 303MHz 的时钟频率下)提供了 3010 亿次连接更新/秒的计算性能,比在通用处理器上运行的软件解决方案高出约 193 倍。最重要的是,与在相同 FPGA 设备(XC2VP70)中实现的上一篇工作相比,该架构在批量学习时的性能提高了 4 倍以上(提高了 12 倍)。

相似文献

1
DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.深思维:受限玻尔兹曼机人工神经网络的深度学习加速器。
IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1441-1453. doi: 10.1109/TNNLS.2017.2665555. Epub 2017 Mar 8.
2
High-performance reconfigurable hardware architecture for restricted Boltzmann machines.用于受限玻尔兹曼机的高性能可重构硬件架构。
IEEE Trans Neural Netw. 2010 Nov;21(11):1780-92. doi: 10.1109/TNN.2010.2073481. Epub 2010 Sep 20.
3
Performance analysis of multiple input single layer neural network hardware chip.多输入单层神经网络硬件芯片的性能分析
Multimed Tools Appl. 2023 Feb 20:1-22. doi: 10.1007/s11042-023-14627-3.
4
Runtime Programmable and Memory Bandwidth Optimized FPGA-Based Coprocessor for Deep Convolutional Neural Network.基于 FPGA 的可运行时编程和内存带宽优化的深度卷积神经网络协处理器。
IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5922-5934. doi: 10.1109/TNNLS.2018.2815085. Epub 2018 Apr 9.
5
Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip.用于异构片上系统的高效反卷积架构
J Imaging. 2020 Aug 25;6(9):85. doi: 10.3390/jimaging6090085.
6
GeCo: Classification Restricted Boltzmann Machine Hardware for On-Chip Semisupervised Learning and Bayesian Inference.GeCo:用于片上半监督学习和贝叶斯推理的分类受限玻尔兹曼机硬件。
IEEE Trans Neural Netw Learn Syst. 2020 Jan;31(1):53-65. doi: 10.1109/TNNLS.2019.2899386. Epub 2019 Mar 15.
7
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System.基于 FPGA 的粒度可变神经形态处理器及其在 MIMO 实时控制系统中的应用。
Sensors (Basel). 2017 Aug 23;17(9):1941. doi: 10.3390/s17091941.
8
Embedded Streaming Deep Neural Networks Accelerator With Applications.嵌入式流深神经网络加速器及其应用。
IEEE Trans Neural Netw Learn Syst. 2017 Jul;28(7):1572-1583. doi: 10.1109/TNNLS.2016.2545298. Epub 2016 Apr 8.
9
Designing Deep Learning Hardware Accelerator and Efficiency Evaluation.深度学习硬件加速器设计与效率评估。
Comput Intell Neurosci. 2022 Jul 13;2022:1291103. doi: 10.1155/2022/1291103. eCollection 2022.
10
Resources and Power Efficient FPGA Accelerators for Real-Time Image Classification.用于实时图像分类的资源与功耗高效FPGA加速器
J Imaging. 2022 Apr 15;8(4):114. doi: 10.3390/jimaging8040114.

引用本文的文献

1
Autoencoder and restricted Boltzmann machine for transfer learning in functional magnetic resonance imaging task classification.用于功能磁共振成像任务分类中迁移学习的自动编码器和受限玻尔兹曼机
Heliyon. 2023 Jul 16;9(7):e18086. doi: 10.1016/j.heliyon.2023.e18086. eCollection 2023 Jul.
2
Correlation Analysis Between Japanese Literature and Psychotherapy Based on Diagnostic Equation Algorithm.基于诊断方程算法的日本文学与心理治疗的相关性分析
Front Psychol. 2022 May 30;13:906952. doi: 10.3389/fpsyg.2022.906952. eCollection 2022.
3
Improved Artificial Neural Network with State Order Dataset Estimation for Brain Cancer Cell Diagnosis.
基于状态顺序数据集估计的改进人工神经网络在脑癌细胞诊断中的应用。
Biomed Res Int. 2022 Apr 16;2022:7799812. doi: 10.1155/2022/7799812. eCollection 2022.
4
Steganography-based voice hiding in medical images of COVID-19 patients.基于隐写术的新冠患者医学图像语音隐藏
Nonlinear Dyn. 2021;105(3):2677-2692. doi: 10.1007/s11071-021-06700-z. Epub 2021 Jul 22.