• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于实时工业视觉应用的基于现场可编程门阵列的YOLOv5加速器

An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications.

作者信息

Yan Zhihong, Zhang Bingqian, Wang Dong

机构信息

Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China.

Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China.

出版信息

Micromachines (Basel). 2024 Sep 19;15(9):1164. doi: 10.3390/mi15091164.

DOI:10.3390/mi15091164
PMID:39337824
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11434529/
Abstract

The You Only Look Once (YOLO) object detection network has garnered widespread adoption in various industries, owing to its superior inference speed and robust detection capabilities. This model has proven invaluable in automating production processes such as material processing, machining, and quality inspection. However, as market competition intensifies, there is a constant demand for higher detection speed and accuracy. Current FPGA accelerators based on 8-bit quantization have struggled to meet these increasingly stringent performance requirements. In response, we present a novel 4-bit quantization-based neural network accelerator for the YOLOv5 model, designed to enhance real-time processing capabilities while maintaining high detection accuracy. To achieve effective model compression, we introduce an optimized quantization scheme that reduces the bit-width of the entire YOLO network-including the first layer-to 4 bits, with only a 1.5% degradation in mean Average Precision (mAP). For the hardware implementation, we propose a unified Digital Signal Processor (DSP) packing scheme, coupled with a novel parity adder tree architecture that accommodates the proposed quantization strategies. This approach efficiently reduces on-chip DSP utilization by 50%, offering a significant improvement in performance and resource efficiency. Experimental results show that the industrial object detection system based on the proposed FPGA accelerator achieves a throughput of 808.6 GOPS and an efficiency of 0.49 GOPS/DSP for YOLOv5s on the ZCU102 board, which is 29% higher than a commercial FPGA accelerator design (Xilinx's Vitis AI).

摘要

你只看一次(YOLO)目标检测网络因其卓越的推理速度和强大的检测能力,在各个行业中得到了广泛应用。该模型在诸如材料加工、机械加工和质量检测等生产流程自动化中已证明具有极高价值。然而,随着市场竞争的加剧,对更高检测速度和精度的需求持续存在。当前基于8位量化的FPGA加速器难以满足这些日益严格的性能要求。为此,我们提出了一种新颖的基于4位量化的YOLOv5模型神经网络加速器,旨在提高实时处理能力的同时保持高检测精度。为实现有效的模型压缩,我们引入了一种优化的量化方案,将整个YOLO网络(包括第一层)的位宽降至4位,平均精度均值(mAP)仅下降1.5%。在硬件实现方面,我们提出了一种统一的数字信号处理器(DSP)打包方案,并结合一种新颖的奇偶加法树架构以适应所提出的量化策略。这种方法有效地将片上DSP利用率降低了50%,在性能和资源效率方面有显著提升。实验结果表明,基于所提出的FPGA加速器的工业目标检测系统在ZCU102板上对YOLOv5s实现了808.6 GOPS的吞吐量和0.49 GOPS/DSP的效率,比商业FPGA加速器设计(赛灵思的Vitis AI)高出29%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/6def0ef209c7/micromachines-15-01164-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/1bc6fc67d12d/micromachines-15-01164-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/2dae11b82946/micromachines-15-01164-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/92584f883ed5/micromachines-15-01164-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/3bcd67b743f8/micromachines-15-01164-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/c7e2234336f9/micromachines-15-01164-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/1ca0b4e7bac5/micromachines-15-01164-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/108b1c4d8130/micromachines-15-01164-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/3891729db416/micromachines-15-01164-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/bb615f6c4c7d/micromachines-15-01164-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/9a08087125f9/micromachines-15-01164-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/8708980a2e3a/micromachines-15-01164-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/ae9429c5862b/micromachines-15-01164-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/0f244519359e/micromachines-15-01164-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/6def0ef209c7/micromachines-15-01164-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/1bc6fc67d12d/micromachines-15-01164-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/2dae11b82946/micromachines-15-01164-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/92584f883ed5/micromachines-15-01164-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/3bcd67b743f8/micromachines-15-01164-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/c7e2234336f9/micromachines-15-01164-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/1ca0b4e7bac5/micromachines-15-01164-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/108b1c4d8130/micromachines-15-01164-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/3891729db416/micromachines-15-01164-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/bb615f6c4c7d/micromachines-15-01164-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/9a08087125f9/micromachines-15-01164-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/8708980a2e3a/micromachines-15-01164-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/ae9429c5862b/micromachines-15-01164-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/0f244519359e/micromachines-15-01164-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1bb9/11434529/6def0ef209c7/micromachines-15-01164-g014.jpg

相似文献

1
An FPGA-Based YOLOv5 Accelerator for Real-Time Industrial Vision Applications.一种用于实时工业视觉应用的基于现场可编程门阵列的YOLOv5加速器
Micromachines (Basel). 2024 Sep 19;15(9):1164. doi: 10.3390/mi15091164.
2
An OpenCL-Based FPGA Accelerator for Faster R-CNN.一种基于OpenCL的用于更快区域卷积神经网络(Faster R-CNN)的现场可编程门阵列(FPGA)加速器。
Entropy (Basel). 2022 Sep 23;24(10):1346. doi: 10.3390/e24101346.
3
FPGA-Based Hybrid-Type Implementation of Quantized Neural Networks for Remote Sensing Applications.基于 FPGA 的量化神经网络混合式实现及其在遥感中的应用。
Sensors (Basel). 2019 Feb 22;19(4):924. doi: 10.3390/s19040924.
4
QuantLaneNet: A 640-FPS and 34-GOPS/W FPGA-Based CNN Accelerator for Lane Detection.QuantLaneNet:一种基于FPGA的用于车道检测的640帧每秒且34千兆次运算每秒每瓦的卷积神经网络加速器。
Sensors (Basel). 2023 Jul 25;23(15):6661. doi: 10.3390/s23156661.
5
Quantization-Aware NN Layers with High-throughput FPGA Implementation for Edge AI.具有高吞吐量 FPGA 实现的量化感知神经网络层,用于边缘人工智能。
Sensors (Basel). 2023 May 11;23(10):4667. doi: 10.3390/s23104667.
6
An Efficient YOLO Algorithm with an Attention Mechanism for Vision-Based Defect Inspection Deployed on FPGA.一种基于注意力机制的高效YOLO算法,用于基于视觉的缺陷检测并部署在FPGA上。
Micromachines (Basel). 2022 Jun 30;13(7):1058. doi: 10.3390/mi13071058.
7
High-Performance Acceleration of 2-D and 3-D CNNs on FPGAs Using Static Block Floating Point.使用静态块浮点在现场可编程门阵列上对二维和三维卷积神经网络进行高性能加速。
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4473-4487. doi: 10.1109/TNNLS.2021.3116302. Epub 2023 Aug 4.
8
Accelerating GRAPPA reconstruction using SoC design for real-time cardiac MRI.利用 SoC 设计加速 GRAPPA 重建,实现实时心脏 MRI。
Comput Biol Med. 2023 Jun;160:107008. doi: 10.1016/j.compbiomed.2023.107008. Epub 2023 May 4.
9
IVS-Caffe-Hardware-Oriented Neural Network Model Development.基于 IVS 硬件的面向神经网络模型开发。
IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5978-5992. doi: 10.1109/TNNLS.2021.3072145. Epub 2022 Oct 5.
10
FPGA-Based Vehicle Detection and Tracking Accelerator.基于 FPGA 的车辆检测与跟踪加速器。
Sensors (Basel). 2023 Feb 16;23(4):2208. doi: 10.3390/s23042208.

引用本文的文献

1
MMG-Based Motion Segmentation and Recognition of Upper Limb Rehabilitation Using the YOLOv5s-SE.基于MMG的上肢康复运动分割与识别:使用YOLOv5s-SE
Sensors (Basel). 2025 Apr 3;25(7):2257. doi: 10.3390/s25072257.
2
A Low-Power General Matrix Multiplication Accelerator with Sparse Weight-and-Output Stationary Dataflow.一种采用稀疏权重和输出固定数据流的低功耗通用矩阵乘法加速器。
Micromachines (Basel). 2025 Jan 16;16(1):101. doi: 10.3390/mi16010101.

本文引用的文献

1
FPGA-Based Vehicle Detection and Tracking Accelerator.基于 FPGA 的车辆检测与跟踪加速器。
Sensors (Basel). 2023 Feb 16;23(4):2208. doi: 10.3390/s23042208.
2
YOLOv4-Tiny-Based Coal Gangue Image Recognition and FPGA Implementation.基于YOLOv4-Tiny的煤矸石图像识别与FPGA实现
Micromachines (Basel). 2022 Nov 16;13(11):1983. doi: 10.3390/mi13111983.
3
An Efficient YOLO Algorithm with an Attention Mechanism for Vision-Based Defect Inspection Deployed on FPGA.一种基于注意力机制的高效YOLO算法,用于基于视觉的缺陷检测并部署在FPGA上。
Micromachines (Basel). 2022 Jun 30;13(7):1058. doi: 10.3390/mi13071058.
4
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.在强化学习中用于神经网络函数逼近的 Sigmoid 加权线性单元。
Neural Netw. 2018 Nov;107:3-11. doi: 10.1016/j.neunet.2017.12.012. Epub 2018 Jan 11.
5
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.空间金字塔池化在深度卷积网络中的视觉识别。
IEEE Trans Pattern Anal Mach Intell. 2015 Sep;37(9):1904-16. doi: 10.1109/TPAMI.2015.2389824.