Suppr超能文献

深思维:受限玻尔兹曼机人工神经网络的深度学习加速器。

DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks.

出版信息

IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1441-1453. doi: 10.1109/TNNLS.2017.2665555. Epub 2017 Mar 8.

Abstract

Although there have been many decades of research and commercial presence on high performance general purpose processors, there are still many applications that require fully customized hardware architectures for further computational acceleration. Recently, deep learning has been successfully used to learn in a wide variety of applications, but their heavy computation demand has considerably limited their practical applications. This paper proposes a fully pipelined acceleration architecture to alleviate high computational demand of an artificial neural network (ANN) which is restricted Boltzmann machine (RBM) ANNs. The implemented RBM ANN accelerator (integrating network size, using 128 input cases per batch, and running at a 303-MHz clock frequency) integrated in a state-of-the art field-programmable gate array (FPGA) (Xilinx Virtex 7 XC7V-2000T) provides a computational performance of 301-billion connection-updates-per-second and about 193 times higher performance than a software solution running on general purpose processors. Most importantly, the architecture enables over 4 times (12 times in batch learning) higher performance compared with a previous work when both are implemented in an FPGA device (XC2VP70).

摘要

尽管在高性能通用处理器上已经进行了数十年的研究和商业化应用,但仍有许多应用需要完全定制的硬件架构来进一步实现计算加速。最近,深度学习已经成功地应用于各种不同的领域,但它们对计算的大量需求严重限制了它们的实际应用。本文提出了一种全流水线加速架构,以减轻人工神经网络(ANN)的高计算需求,人工神经网络是受限玻尔兹曼机(RBM)的神经网络。在一个最先进的现场可编程门阵列(FPGA)(Xilinx Virtex 7 XC7V-2000T)中集成实现的 RBM ANN 加速器(集成网络大小,每个批次使用 128 个输入案例,运行在 303MHz 的时钟频率下)提供了 3010 亿次连接更新/秒的计算性能,比在通用处理器上运行的软件解决方案高出约 193 倍。最重要的是,与在相同 FPGA 设备(XC2VP70)中实现的上一篇工作相比,该架构在批量学习时的性能提高了 4 倍以上(提高了 12 倍)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验