Department of Electrical Engineering, Universidad de Concepción, Concepción 4070386, Chile.
Sensors (Basel). 2021 Apr 9;21(8):2637. doi: 10.3390/s21082637.
Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.
卷积神经网络(CNN)由于其高精度而被广泛用于图像分类。然而,推理是一个计算密集型的过程,通常需要硬件加速才能实时运行。对于移动设备,图形处理器(GPU)的功耗常常是不可接受的,而现场可编程门阵列(FPGA)成为了在高速下进行推理的解决方案。尽管之前的工作已经在 FPGA 上实现了 CNN 推理,但它们对片上内存和算术资源的高度利用使得它们在资源受限的边缘设备上的应用变得复杂。在本文中,我们提出了一种可扩展的、低功耗、低资源利用率的加速器架构,用于在 MobileNet V2 CNN 上进行推理。该架构采用了一种异构系统,其中嵌入式处理器作为主控制器,外部存储器用于存储网络数据,专用硬件则在可重构逻辑上实现,具有可扩展的处理元件(PE)数量。该架构在运行频率为 200 MHz 的 XCZU7EV FPGA 上实现,使用四个 PE,推断准确率达到 87%,TOP-5,处理 224×224 像素的图像需要 220 毫秒。它消耗 7.35 W 的功率,并且使用的逻辑和算术资源不到其他 MobileNet FPGA 加速器的 30%。