Suppr超能文献

用于边缘计算应用的卷积神经网络的全并行随机计算硬件实现

Fully Parallel Stochastic Computing Hardware Implementation of Convolutional Neural Networks for Edge Computing Applications.

作者信息

Frasser Christiam F, Linares-Serrano Pablo, de Rios Ivan Diez de Los, Moran Alejandro, Skibinsky-Gitlin Erik S, Font-Rossello Joan, Canals Vincent, Roca Miquel, Serrano-Gotarredona Teresa, Rossello Josep L

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10408-10418. doi: 10.1109/TNNLS.2022.3166799. Epub 2023 Nov 30.

Abstract

Edge artificial intelligence (AI) is receiving a tremendous amount of interest from the machine learning community due to the ever-increasing popularization of the Internet of Things (IoT). Unfortunately, the incorporation of AI characteristics to edge computing devices presents the drawbacks of being power and area hungry for typical deep learning techniques such as convolutional neural networks (CNNs). In this work, we propose a power-and-area efficient architecture based on the exploitation of the correlation phenomenon in stochastic computing (SC) systems. The proposed architecture solves the challenges that a CNN implementation with SC (SC-CNN) may present, such as the high resources used in binary-to-stochastic conversion, the inaccuracy produced by undesired correlation between signals, and the complexity of the stochastic maximum function implementation. To prove that our architecture meets the requirements of edge intelligence realization, we embed a fully parallel CNN in a single field-programmable gate array (FPGA) chip. The results obtained showed a better performance than traditional binary logic and other SC implementations. In addition, we performed a full VLSI synthesis of the proposed design, showing that it presents better overall characteristics than other recently published VLSI architectures.

摘要

由于物联网(IoT)的日益普及,边缘人工智能(AI)正受到机器学习社区的极大关注。不幸的是,将AI特性融入边缘计算设备存在一些缺点,对于诸如卷积神经网络(CNN)等典型深度学习技术而言,会消耗大量电力和占用大量面积。在这项工作中,我们基于对随机计算(SC)系统中相关现象的利用,提出了一种功耗和面积高效的架构。所提出的架构解决了用SC实现CNN(SC-CNN)可能带来的挑战,例如二进制到随机转换中使用的高资源量、信号间不期望的相关性产生的不准确性以及随机最大函数实现的复杂性。为了证明我们的架构满足边缘智能实现的要求,我们将一个完全并行的CNN嵌入到单个现场可编程门阵列(FPGA)芯片中。获得的结果表明,其性能优于传统二进制逻辑和其他SC实现方式。此外,我们对所提出的设计进行了完整的超大规模集成电路(VLSI)综合,结果表明它比最近发表的其他VLSI架构具有更好的整体特性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验