面向嵌入式系统图像分类的异构硬件加速器。

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems.

机构信息

Department of Electrical Engineering, Universidad de Concepción, Concepción 4070386, Chile.

出版信息

Sensors (Basel). 2021 Apr 9;21(8):2637. doi: 10.3390/s21082637.

DOI:10.3390/s21082637

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8069940/

Abstract

Convolutional neural networks (CNN) have been extensively employed for image classification due to their high accuracy. However, inference is a computationally-intensive process that often requires hardware acceleration to operate in real time. For mobile devices, the power consumption of graphics processors (GPUs) is frequently prohibitive, and field-programmable gate arrays (FPGA) become a solution to perform inference at high speed. Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices. In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN. The architecture uses a heterogeneous system with an embedded processor as the main controller, external memory to store network data, and dedicated hardware implemented on reconfigurable logic with a scalable number of processing elements (PE). Implemented on a XCZU7EV FPGA running at 200 MHz and using four PEs, the accelerator infers with 87% top-5 accuracy and processes an image of 224×224 pixels in 220 ms. It consumes 7.35 W of power and uses less than 30% of the logic and arithmetic resources used by other MobileNet FPGA accelerators.

摘要

卷积神经网络（CNN）由于其高精度而被广泛用于图像分类。然而，推理是一个计算密集型的过程，通常需要硬件加速才能实时运行。对于移动设备，图形处理器（GPU）的功耗常常是不可接受的，而现场可编程门阵列（FPGA）成为了在高速下进行推理的解决方案。尽管之前的工作已经在 FPGA 上实现了 CNN 推理，但它们对片上内存和算术资源的高度利用使得它们在资源受限的边缘设备上的应用变得复杂。在本文中，我们提出了一种可扩展的、低功耗、低资源利用率的加速器架构，用于在 MobileNet V2 CNN 上进行推理。该架构采用了一种异构系统，其中嵌入式处理器作为主控制器，外部存储器用于存储网络数据，专用硬件则在可重构逻辑上实现，具有可扩展的处理元件（PE）数量。该架构在运行频率为 200 MHz 的 XCZU7EV FPGA 上实现，使用四个 PE，推断准确率达到 87%，TOP-5，处理 224×224 像素的图像需要 220 毫秒。它消耗 7.35 W 的功率，并且使用的逻辑和算术资源不到其他 MobileNet FPGA 加速器的 30%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c3f/8069940/352a3ca0820f/sensors-21-02637-g001.jpg

相似文献

1

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems.面向嵌入式系统图像分类的异构硬件加速器。

Sensors (Basel). 2021 Apr 9;21(8):2637. doi: 10.3390/s21082637.

2

Resources and Power Efficient FPGA Accelerators for Real-Time Image Classification.用于实时图像分类的资源与功耗高效FPGA加速器

J Imaging. 2022 Apr 15;8(4):114. doi: 10.3390/jimaging8040114.

3

Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip.用于异构片上系统的高效反卷积架构

J Imaging. 2020 Aug 25;6(9):85. doi: 10.3390/jimaging6090085.

4

FPGA-based neural network accelerators for millimeter-wave radio-over-fiber systems.用于毫米波光纤无线系统的基于现场可编程门阵列的神经网络加速器

Opt Express. 2020 Apr 27;28(9):13384-13400. doi: 10.1364/OE.391050.

5

Hardware Trojan Attacks on the Reconfigurable Interconnections of Field-Programmable Gate Array-Based Convolutional Neural Network Accelerators and a Physically Unclonable Function-Based Countermeasure Detection Technique.针对基于现场可编程门阵列的卷积神经网络加速器可重构互连的硬件木马攻击及基于物理不可克隆功能的对策检测技术

Micromachines (Basel). 2024 Jan 19;15(1):149. doi: 10.3390/mi15010149.

6

Acceleration of Deep Neural Network Training Using Field Programmable Gate Arrays.使用现场可编程门阵列加速深度神经网络训练。

Comput Intell Neurosci. 2022 Oct 17;2022:8387364. doi: 10.1155/2022/8387364. eCollection 2022.

7

A Quantized CNN-Based Microfluidic Lensless-Sensing Mobile Blood-Acquisition and Analysis System.基于量化卷积神经网络的微流控无镜头式移动血液采集与分析系统。

Sensors (Basel). 2019 Nov 21;19(23):5103. doi: 10.3390/s19235103.

8

A Lightweight Detection Method for Remote Sensing Images and Its Energy-Efficient Accelerator on Edge Devices.一种用于遥感图像的轻量级检测方法及其在边缘设备上的节能加速器。

Sensors (Basel). 2023 Jul 18;23(14):6497. doi: 10.3390/s23146497.

9

Lightweight and Energy-Efficient Deep Learning Accelerator for Real-Time Object Detection on Edge Devices.轻量级、节能的深度学习加速器，用于边缘设备上的实时目标检测。

Sensors (Basel). 2023 Jan 20;23(3):1185. doi: 10.3390/s23031185.

10

FPGA-Based Hybrid-Type Implementation of Quantized Neural Networks for Remote Sensing Applications.基于 FPGA 的量化神经网络混合式实现及其在遥感中的应用。

Sensors (Basel). 2019 Feb 22;19(4):924. doi: 10.3390/s19040924.

引用本文的文献

1

Embedded Sensing System for Recognizing Citrus Flowers Using Cascaded Fusion YOLOv4-CF + FPGA.基于级联融合 YOLOv4-CF 和 FPGA 的柑橘花识别嵌入式感知系统。

Sensors (Basel). 2022 Feb 7;22(3):1255. doi: 10.3390/s22031255.

本文引用的文献

1

A Deep Learning-Based Camera Approach for Vital Sign Monitoring Using Thermography Images for ICU Patients.基于深度学习的 ICU 患者体温监测用摄像方法

Sensors (Basel). 2021 Feb 21;21(4):1495. doi: 10.3390/s21041495.

2

SVM classifier on chip for melanoma detection.用于黑色素瘤检测的片上支持向量机分类器。

Annu Int Conf IEEE Eng Med Biol Soc. 2017 Jul;2017:270-274. doi: 10.1109/EMBC.2017.8036814.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验