• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深度学习加速器的配置空间探索对性能和资源利用的影响:Gemmini 案例研究。

Deep Learning Accelerators' Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case Study.

机构信息

Electronics Division, Institute for Scientific and Technological Information, Council for Scientific and Industrial Research, Accra, Ghana.

Intelligent Image Processing Research Center, Korea Electronics Technology Institute, Seongnam-si 13488, Republic of Korea.

出版信息

Sensors (Basel). 2023 Feb 21;23(5):2380. doi: 10.3390/s23052380.

DOI:10.3390/s23052380
PMID:36904584
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10007457/
Abstract

Though custom deep learning (DL) hardware accelerators are attractive for making inferences in edge computing devices, their design and implementation remain a challenge. Open-source frameworks exist for exploring DL hardware accelerators. Gemmini is an open-source systolic array generator for agile DL accelerator exploration. This paper details the hardware/software components generated using Gemmini. The general matrix-to-matrix multiplication (GEMM) of different dataflow options, including output/weight stationary (OS/WS), was explored in Gemmini to estimate the performance relative to a CPU implementation. The Gemmini hardware was implemented on an FPGA device to explore the effect of several accelerator parameters, including array size, memory capacity, and the CPU/hardware image-to-column (im2col) module, on metrics such as the area, frequency, and power. This work revealed that regarding the performance, the WS dataflow offered a speedup of 3× relative to the OS dataflow, and the hardware im2col operation offered a speedup of 1.1× relative to the operation on the CPU. For hardware resources, an increase in the array size by a factor of 2 led to an increase in both the area and power by a factor of 3.3, and the im2col module led to an increase in area and power by factors of 1.01 and 1.06, respectively.

摘要

虽然定制的深度学习(DL)硬件加速器对于在边缘计算设备中进行推理很有吸引力,但它们的设计和实现仍然是一个挑战。现已有用于探索 DL 硬件加速器的开源框架。Gemmini 是一个用于敏捷 DL 加速器探索的开源脉动阵列生成器。本文详细介绍了使用 Gemmini 生成的硬件/软件组件。在 Gemmini 中探索了不同数据流选项(包括输出/权重静止(OS/WS))的通用矩阵到矩阵乘法(GEMM),以相对于 CPU 实现估计性能。在 FPGA 设备上实现了 Gemmini 硬件,以探索几个加速器参数(包括阵列大小、内存容量和 CPU/硬件图像到列(im2col)模块)对面积、频率和功率等指标的影响。这项工作表明,就性能而言,WS 数据流相对于 OS 数据流提供了 3 倍的加速,硬件 im2col 操作相对于 CPU 上的操作提供了 1.1 倍的加速。对于硬件资源,阵列大小增加两倍会导致面积和功率分别增加三倍,im2col 模块会导致面积和功率分别增加 1.01 倍和 1.06 倍。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/3390e06cae05/sensors-23-02380-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/04adcb1797f6/sensors-23-02380-g001a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/422c8f2b8bcc/sensors-23-02380-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/9e63f467e4be/sensors-23-02380-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/d80cf50b89cf/sensors-23-02380-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/e92bba2cdb44/sensors-23-02380-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/64843965b06e/sensors-23-02380-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/5744bbc6d78a/sensors-23-02380-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/3111f0711a90/sensors-23-02380-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/8a59ad8a1624/sensors-23-02380-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/347e0e5446a0/sensors-23-02380-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/e492f86abb5b/sensors-23-02380-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/d1407fac14fd/sensors-23-02380-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/70b3ea3082a8/sensors-23-02380-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/8fd1330f8302/sensors-23-02380-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/fdec1a4a2347/sensors-23-02380-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/2b3ccc74c4e7/sensors-23-02380-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/1da49e073207/sensors-23-02380-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/3390e06cae05/sensors-23-02380-g018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/04adcb1797f6/sensors-23-02380-g001a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/422c8f2b8bcc/sensors-23-02380-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/9e63f467e4be/sensors-23-02380-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/d80cf50b89cf/sensors-23-02380-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/e92bba2cdb44/sensors-23-02380-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/64843965b06e/sensors-23-02380-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/5744bbc6d78a/sensors-23-02380-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/3111f0711a90/sensors-23-02380-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/8a59ad8a1624/sensors-23-02380-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/347e0e5446a0/sensors-23-02380-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/e492f86abb5b/sensors-23-02380-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/d1407fac14fd/sensors-23-02380-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/70b3ea3082a8/sensors-23-02380-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/8fd1330f8302/sensors-23-02380-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/fdec1a4a2347/sensors-23-02380-g015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/2b3ccc74c4e7/sensors-23-02380-g016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/1da49e073207/sensors-23-02380-g017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e17/10007457/3390e06cae05/sensors-23-02380-g018.jpg

相似文献

1
Deep Learning Accelerators' Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case Study.深度学习加速器的配置空间探索对性能和资源利用的影响:Gemmini 案例研究。
Sensors (Basel). 2023 Feb 21;23(5):2380. doi: 10.3390/s23052380.
2
Distributed large-scale graph processing on FPGAs.基于现场可编程门阵列(FPGA)的分布式大规模图形处理
J Big Data. 2023;10(1):95. doi: 10.1186/s40537-023-00756-x. Epub 2023 Jun 4.
3
An Accelerator Design Using a MTCA Decomposition Algorithm for CNNs.一种使用MTCA分解算法的卷积神经网络加速器设计
Sensors (Basel). 2020 Sep 28;20(19):5558. doi: 10.3390/s20195558.
4
Resources and Power Efficient FPGA Accelerators for Real-Time Image Classification.用于实时图像分类的资源与功耗高效FPGA加速器
J Imaging. 2022 Apr 15;8(4):114. doi: 10.3390/jimaging8040114.
5
Power-Intent Systolic Array Using Modified Parallel Multiplier for Machine Learning Acceleration.基于改进并行乘法器的用于机器学习加速的功率意识脉动阵列。
Sensors (Basel). 2023 Apr 26;23(9):4297. doi: 10.3390/s23094297.
6
Custom Hardware Architectures for Deep Learning on Portable Devices: A Review.便携式设备上深度学习的定制硬件架构:综述。
IEEE Trans Neural Netw Learn Syst. 2022 Nov;33(11):6068-6088. doi: 10.1109/TNNLS.2021.3082304. Epub 2022 Oct 27.
7
NeuroSim Simulator for Compute-in-Memory Hardware Accelerator: Validation and Benchmark.用于内存计算硬件加速器的NeuroSim模拟器:验证与基准测试
Front Artif Intell. 2021 Jun 9;4:659060. doi: 10.3389/frai.2021.659060. eCollection 2021.
8
An OpenCL-Based FPGA Accelerator for Faster R-CNN.一种基于OpenCL的用于更快区域卷积神经网络(Faster R-CNN)的现场可编程门阵列(FPGA)加速器。
Entropy (Basel). 2022 Sep 23;24(10):1346. doi: 10.3390/e24101346.
9
Research on Convolutional Neural Network Inference Acceleration and Performance Optimization for Edge Intelligence.面向边缘智能的卷积神经网络推理加速与性能优化研究
Sensors (Basel). 2023 Dec 31;24(1):240. doi: 10.3390/s24010240.
10
Lightweight and Energy-Efficient Deep Learning Accelerator for Real-Time Object Detection on Edge Devices.轻量级、节能的深度学习加速器,用于边缘设备上的实时目标检测。
Sensors (Basel). 2023 Jan 20;23(3):1185. doi: 10.3390/s23031185.

引用本文的文献

1
New Systolic Array Algorithms and VLSI Architectures for 1-D MDST.一维 MDST 的新的 systolic 数组算法和 VLSI 架构。
Sensors (Basel). 2023 Jul 7;23(13):6220. doi: 10.3390/s23136220.

本文引用的文献

1
The Braking-Pressure and Driving-Direction Determination System (BDDS) Using Road Roughness and Passenger Conditions of Surrounding Vehicles.基于道路粗糙度和周围车辆乘客状态的制动压力和行驶方向确定系统(BDDS)。
Sensors (Basel). 2022 Jun 10;22(12):4414. doi: 10.3390/s22124414.