深度学习加速器的配置空间探索对性能和资源利用的影响：Gemmini 案例研究。

Deep Learning Accelerators' Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case Study.

机构信息

Electronics Division, Institute for Scientific and Technological Information, Council for Scientific and Industrial Research, Accra, Ghana.

Intelligent Image Processing Research Center, Korea Electronics Technology Institute, Seongnam-si 13488, Republic of Korea.

出版信息

Sensors (Basel). 2023 Feb 21;23(5):2380. doi: 10.3390/s23052380.

DOI:10.3390/s23052380

PMID:36904584

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10007457/

Abstract

Though custom deep learning (DL) hardware accelerators are attractive for making inferences in edge computing devices, their design and implementation remain a challenge. Open-source frameworks exist for exploring DL hardware accelerators. Gemmini is an open-source systolic array generator for agile DL accelerator exploration. This paper details the hardware/software components generated using Gemmini. The general matrix-to-matrix multiplication (GEMM) of different dataflow options, including output/weight stationary (OS/WS), was explored in Gemmini to estimate the performance relative to a CPU implementation. The Gemmini hardware was implemented on an FPGA device to explore the effect of several accelerator parameters, including array size, memory capacity, and the CPU/hardware image-to-column (im2col) module, on metrics such as the area, frequency, and power. This work revealed that regarding the performance, the WS dataflow offered a speedup of 3× relative to the OS dataflow, and the hardware im2col operation offered a speedup of 1.1× relative to the operation on the CPU. For hardware resources, an increase in the array size by a factor of 2 led to an increase in both the area and power by a factor of 3.3, and the im2col module led to an increase in area and power by factors of 1.01 and 1.06, respectively.

摘要

虽然定制的深度学习（DL）硬件加速器对于在边缘计算设备中进行推理很有吸引力，但它们的设计和实现仍然是一个挑战。现已有用于探索 DL 硬件加速器的开源框架。Gemmini 是一个用于敏捷 DL 加速器探索的开源脉动阵列生成器。本文详细介绍了使用 Gemmini 生成的硬件/软件组件。在 Gemmini 中探索了不同数据流选项（包括输出/权重静止（OS/WS））的通用矩阵到矩阵乘法（GEMM），以相对于 CPU 实现估计性能。在 FPGA 设备上实现了 Gemmini 硬件，以探索几个加速器参数（包括阵列大小、内存容量和 CPU/硬件图像到列（im2col）模块）对面积、频率和功率等指标的影响。这项工作表明，就性能而言，WS 数据流相对于 OS 数据流提供了 3 倍的加速，硬件 im2col 操作相对于 CPU 上的操作提供了 1.1 倍的加速。对于硬件资源，阵列大小增加两倍会导致面积和功率分别增加三倍，im2col 模块会导致面积和功率分别增加 1.01 倍和 1.06 倍。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

深度学习加速器的配置空间探索对性能和资源利用的影响：Gemmini 案例研究。

Deep Learning Accelerators' Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case Study.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

深度学习加速器的配置空间探索对性能和资源利用的影响：Gemmini 案例研究。

Deep Learning Accelerators' Configuration Space Exploration Effect on Performance and Resource Utilization: A Gemmini Case Study.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献