• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于搜索的计算硬件架构,具有精确可扩展和计算可重构方案。

SCA: Search-Based Computing Hardware Architecture with Precision Scalable and Computation Reconfigurable Scheme.

机构信息

School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.

出版信息

Sensors (Basel). 2022 Nov 6;22(21):8545. doi: 10.3390/s22218545.

DOI:10.3390/s22218545
PMID:36366242
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9658340/
Abstract

Deep neural networks have been deployed in various hardware accelerators, such as graph process units (GPUs), field-program gate arrays (FPGAs), and application specific integrated circuit (ASIC) chips. Normally, a huge amount of computation is required in the inference process, creating significant logic resource overheads. In addition, frequent data accessions between off-chip memory and hardware accelerators create bottlenecks, leading to decline in hardware efficiency. Many solutions have been proposed to reduce hardware overhead and data movements. For example, specific lookup-table (LUT)-based hardware architecture can be used to mitigate computing operation demands. However, typical LUT-based accelerators are affected by computational precision limitation and poor scalability issues. In this paper, we propose a search-based computing scheme based on an LUT solution, which improves computation efficiency by replacing traditional multiplication with a search operation. In addition, the proposed scheme supports different precision multiple-bit widths to meet the needs of different DNN-based applications. We design a reconfigurable computing strategy, which can efficiently adapt to the convolution of different kernel sizes to improve hardware scalability. We implement a search-based architecture, namely SCA, which adopts an on-chip storage mechanism, thus greatly reducing interactions with off-chip memory and alleviating bandwidth pressure. Based on experimental evaluation, the proposed SCA architecture can achieve 92%, 96% and 98% computational utilization for computational precision of 4 bit, 8 bit and 16 bit, respectively. Compared with state-of-the-art LUT-based architecture, the efficiency can be improved four-fold.

摘要

深度神经网络已经部署在各种硬件加速器中,如图形处理单元 (GPU)、现场可编程门阵列 (FPGA) 和专用集成电路 (ASIC) 芯片。通常,推理过程需要大量的计算,这会导致大量的逻辑资源开销。此外,片外存储器和硬件加速器之间频繁的数据访问会造成瓶颈,导致硬件效率下降。已经提出了许多解决方案来减少硬件开销和数据移动。例如,可以使用基于特定查找表 (LUT) 的硬件架构来减轻计算操作的需求。然而,典型的基于 LUT 的加速器受到计算精度限制和可扩展性差的问题的影响。在本文中,我们提出了一种基于 LUT 解决方案的搜索计算方案,通过用搜索操作代替传统乘法来提高计算效率。此外,该方案支持不同精度的多位宽度,以满足不同基于 DNN 的应用的需求。我们设计了一种可重构计算策略,能够有效地适应不同核大小的卷积,提高硬件的可扩展性。我们实现了一种基于搜索的架构,即 SCA,它采用片上存储机制,从而大大减少了与片外存储器的交互,缓解了带宽压力。基于实验评估,所提出的 SCA 架构在 4 位、8 位和 16 位计算精度下分别实现了 92%、96%和 98%的计算利用率。与最先进的基于 LUT 的架构相比,效率可以提高四倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/ddb21cdf494f/sensors-22-08545-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/d759b56025e7/sensors-22-08545-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/0be878400e79/sensors-22-08545-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/cfacfa89110e/sensors-22-08545-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/76a827fbd39f/sensors-22-08545-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/176cecc82ff4/sensors-22-08545-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/5c5edbe529c9/sensors-22-08545-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/2bd9819eb236/sensors-22-08545-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/59e3ee51c7dc/sensors-22-08545-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/22d12fc26316/sensors-22-08545-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/339f9059867b/sensors-22-08545-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/469bb97293a7/sensors-22-08545-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/80305e7760bd/sensors-22-08545-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/c23a09b2d2c5/sensors-22-08545-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/26a88bb00b50/sensors-22-08545-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/e5ba13139cca/sensors-22-08545-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/46f0d23b59e2/sensors-22-08545-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/53548c4ea956/sensors-22-08545-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/c653da82e881/sensors-22-08545-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/ddb21cdf494f/sensors-22-08545-g019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/d759b56025e7/sensors-22-08545-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/0be878400e79/sensors-22-08545-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/cfacfa89110e/sensors-22-08545-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/76a827fbd39f/sensors-22-08545-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/176cecc82ff4/sensors-22-08545-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/5c5edbe529c9/sensors-22-08545-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/2bd9819eb236/sensors-22-08545-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/59e3ee51c7dc/sensors-22-08545-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/22d12fc26316/sensors-22-08545-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/339f9059867b/sensors-22-08545-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/469bb97293a7/sensors-22-08545-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/80305e7760bd/sensors-22-08545-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/c23a09b2d2c5/sensors-22-08545-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/26a88bb00b50/sensors-22-08545-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/e5ba13139cca/sensors-22-08545-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/46f0d23b59e2/sensors-22-08545-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/53548c4ea956/sensors-22-08545-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/c653da82e881/sensors-22-08545-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e92b/9658340/ddb21cdf494f/sensors-22-08545-g019.jpg

相似文献

1
SCA: Search-Based Computing Hardware Architecture with Precision Scalable and Computation Reconfigurable Scheme.基于搜索的计算硬件架构,具有精确可扩展和计算可重构方案。
Sensors (Basel). 2022 Nov 6;22(21):8545. doi: 10.3390/s22218545.
2
Distributed large-scale graph processing on FPGAs.基于现场可编程门阵列(FPGA)的分布式大规模图形处理
J Big Data. 2023;10(1):95. doi: 10.1186/s40537-023-00756-x. Epub 2023 Jun 4.
3
High-Performance Method and Architecture for Attention Computation in DNN Inference.用于深度神经网络推理中注意力计算的高性能方法与架构
IEEE Trans Biomed Circuits Syst. 2025 Apr;19(2):404-415. doi: 10.1109/TBCAS.2024.3436837. Epub 2025 Apr 2.
4
High-Performance Acceleration of 2-D and 3-D CNNs on FPGAs Using Static Block Floating Point.使用静态块浮点在现场可编程门阵列上对二维和三维卷积神经网络进行高性能加速。
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4473-4487. doi: 10.1109/TNNLS.2021.3116302. Epub 2023 Aug 4.
5
Scalable Digital Neuromorphic Architecture for Large-Scale Biophysically Meaningful Neural Network With Multi-Compartment Neurons.可扩展的数字神经形态架构,用于具有多腔神经元的大规模生物物理意义神经网络。
IEEE Trans Neural Netw Learn Syst. 2020 Jan;31(1):148-162. doi: 10.1109/TNNLS.2019.2899936. Epub 2019 Mar 18.
6
A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation.一种硬件友好的高精度 CNN 剪枝方法及其 FPGA 实现。
Sensors (Basel). 2023 Jan 11;23(2):824. doi: 10.3390/s23020824.
7
A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms.一种针对 LSTM/GRU 算法的基于定点的 FPGA/ASIC 硬件加速器设计的后训练量化方法。
Comput Intell Neurosci. 2022 May 11;2022:9485933. doi: 10.1155/2022/9485933. eCollection 2022.
8
SRAM-Based CIM Architecture Design for Event Detection.基于静态随机存储器的事件检测 CIM 体系结构设计。
Sensors (Basel). 2022 Oct 16;22(20):7854. doi: 10.3390/s22207854.
9
A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems.面向嵌入式系统图像分类的异构硬件加速器。
Sensors (Basel). 2021 Apr 9;21(8):2637. doi: 10.3390/s21082637.
10
FinFET 6T-SRAM All-Digital Compute-in-Memory for Artificial Intelligence Applications: An Overview and Analysis.用于人工智能应用的FinFET 6T-SRAM全数字内存计算:概述与分析
Micromachines (Basel). 2023 Jul 31;14(8):1535. doi: 10.3390/mi14081535.

本文引用的文献

1
Weighted Feature Fusion of Convolutional Neural Network and Graph Attention Network for Hyperspectral Image Classification.基于卷积神经网络和图注意力网络的加权特征融合的高光谱图像分类方法。
IEEE Trans Image Process. 2022;31:1559-1572. doi: 10.1109/TIP.2022.3144017. Epub 2022 Feb 1.