Suppr超能文献

用于深度神经网络的经济高效随机 MAC 电路。

Cost-effective stochastic MAC circuits for deep neural networks.

机构信息

School of Electrical and Computer Engineering, UNIST, 50, UNIST-gil, Ulsan 44919, Republic of Korea.

School of Electrical and Computer Engineering, UNIST, 50, UNIST-gil, Ulsan 44919, Republic of Korea.

出版信息

Neural Netw. 2019 Sep;117:152-162. doi: 10.1016/j.neunet.2019.04.017. Epub 2019 May 20.

Abstract

Stochastic computing (SC) is a promising computing paradigm that can help address both the uncertainties of future process technology and the challenges of efficient hardware realization for deep neural networks (DNNs). However the impreciseness and long latency of SC have rendered previous SC-based DNN architectures less competitive against optimized fixed-point digital implementations, unless inference accuracy is significantly sacrificed. In this paper we propose a new SC-MAC (multiply-and-accumulate) algorithm, which is a key building block for SC-based DNNs, that is orders of magnitude more efficient and accurate than previous SC-MACs. We also show how our new SC-MAC can be extended to a vector version and used to accelerate both convolution and fully-connected layers of convolutional neural networks (CNNs) using the same hardware. Our experimental results using CNNs designed for MNIST and CIFAR-10 datasets demonstrate that not only is our SC-based CNNs more accurate and 40∼490× more energy-efficient for convolution layers than conventional SC-based ones, but ours can also achieve lower area-delay product and lower energy compared with precision-optimized fixed-point implementations without sacrificing accuracy. We also demonstrate the feasibility of our SC-based CNNs through FPGA prototypes.

摘要

随机计算(SC)是一种很有前途的计算范例,可以帮助解决未来工艺技术的不确定性和深度神经网络(DNN)高效硬件实现的挑战。然而,SC 的不精确性和长延迟使得以前基于 SC 的 DNN 架构在不显著牺牲推断准确性的情况下,与优化的定点数字实现相比竞争力较弱。在本文中,我们提出了一种新的 SC-MAC(乘累加)算法,这是基于 SC 的 DNN 的关键构建块,比以前的 SC-MAC 高效和准确几个数量级。我们还展示了如何将我们的新 SC-MAC 扩展为向量版本,并使用相同的硬件加速卷积神经网络(CNN)的卷积和全连接层。我们使用 MNIST 和 CIFAR-10 数据集设计的 CNN 的实验结果表明,我们的基于 SC 的 CNN 不仅在卷积层比传统的基于 SC 的 CNN 更准确和 40∼490 倍更节能,而且与不牺牲准确性的精度优化定点实现相比,还可以实现更低的面积延迟乘积和更低的能量。我们还通过 FPGA 原型证明了我们的基于 SC 的 CNN 的可行性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验