用于图像卷积和转置卷积的灵活硬件加速器设计

Design of Flexible Hardware Accelerators for Image Convolutions and Transposed Convolutions.

作者信息

Sestito Cristian, Spagnolo Fanny, Perri Stefania

机构信息

Department of Informatics, Modeling, Electronics and System Engineering, University of Calabria, 87036 Rende, Italy.

Department of Mechanical, Energy and Management Engineering, University of Calabria, 87036 Rende, Italy.

出版信息

J Imaging. 2021 Oct 12;7(10):210. doi: 10.3390/jimaging7100210.

DOI:10.3390/jimaging7100210

PMID:34677296

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8538663/

Abstract

Nowadays, computer vision relies heavily on convolutional neural networks (CNNs) to perform complex and accurate tasks. Among them, super-resolution CNNs represent a meaningful example, due to the presence of both convolutional (CONV) and transposed convolutional (TCONV) layers. While the former exploit multiply-and-accumulate (MAC) operations to extract features of interest from incoming feature maps (), the latter perform MACs to tune the spatial resolution of the received properly. The ever-growing real-time and low-power requirements of modern computer vision applications represent a stimulus for the research community to investigate the deployment of CNNs on well-suited hardware platforms, such as field programmable gate arrays (FPGAs). FPGAs are widely recognized as valid candidates for trading off computational speed and power consumption, thanks to their flexibility and their capability to also deal with computationally intensive models. In order to reduce the number of operations to be performed, this paper presents a novel hardware-oriented algorithm able to efficiently accelerate both CONVs and TCONVs. The proposed strategy was validated by employing it within a reconfigurable hardware accelerator purposely designed to adapt itself to different operating modes set at run-time. When characterized using the Xilinx XC7K410T FPGA device, the proposed accelerator achieved a throughput of up to 2022.2 GOPS and, in comparison to state-of-the-art competitors, it reached an energy efficiency up to 2.3 times higher, without compromising the overall accuracy.

摘要

如今，计算机视觉严重依赖卷积神经网络（CNN）来执行复杂且精确的任务。其中，超分辨率CNN就是一个有意义的例子，因为它同时包含卷积（CONV）层和转置卷积（TCONV）层。前者利用乘加（MAC）运算从输入特征图中提取感兴趣的特征，而后者则执行MAC运算来适当地调整接收到的特征图的空间分辨率。现代计算机视觉应用对实时性和低功耗的需求不断增长，这刺激了研究界去研究在诸如现场可编程门阵列（FPGA）等合适的硬件平台上部署CNN。FPGA因其灵活性以及处理计算密集型模型的能力，被广泛认为是权衡计算速度和功耗的有效候选方案。为了减少要执行的运算数量，本文提出了一种面向硬件的新颖算法，能够有效加速CONV和TCONV运算。通过在专门设计以适应运行时设置的不同操作模式的可重构硬件加速器中应用该策略，对所提出的策略进行了验证。当使用赛灵思XC7K410T FPGA器件进行特性分析时，所提出的加速器实现了高达2022.2 GOPS的吞吐量，并且与现有竞争对手相比，在不影响整体精度的情况下，能效提高了2.3倍。