Suppr超能文献

RFAG-YOLO:一种用于无人机图像中小目标检测的感受野注意力引导YOLO网络。

RFAG-YOLO: A Receptive Field Attention-Guided YOLO Network for Small-Object Detection in UAV Images.

作者信息

Wei Chengmeng, Wang Wenhong

机构信息

College of Computer Science, Liaocheng University, Liaocheng 252059, China.

出版信息

Sensors (Basel). 2025 Mar 30;25(7):2193. doi: 10.3390/s25072193.

Abstract

The YOLO series of object detection methods have achieved significant success in a wide range of computer vision tasks due to their efficiency and accuracy. However, detecting small objects in UAV images remains a formidable challenge due to factors such as a low resolution, complex background interference, and significant scale variations, which collectively degrade the quality of feature extraction and limit detection performance. To address these challenges, we propose the receptive field attention-guided YOLO (RFAG-YOLO) method, an advanced adaptation of YOLOv8 tailored for small-object detection in UAV imagery, with a focus on improving feature representation and detection robustness. To this end, we introduce a novel network building block, termed the receptive field network block (RFN block), which leverages dynamic kernel parameter adjustments to enhance the model's ability to capture fine-grained local details. To effectively harness multi-scale features, we designed an enhanced FasterNet module based on RFN blocks as the core component of the backbone network in RFAG-YOLO, enabling robust feature extraction across varying resolutions. This approach achieves a balance of semantic information by employing staged downsampling and a hierarchical arrangement of RFN blocks, ensuring optimal feature representation across different resolutions. Additionally, we introduced a Scale-Aware Feature Amalgamation (SAF) component prior to the detection head of RFAG-YOLO. This component employs a scale attention mechanism to dynamically weight features from both higher and lower layers, facilitating richer information flow and significantly improving the model's robustness to complex backgrounds and scale variations. Experimental results on the VisDrone2019 dataset demonstrated that RFAG-YOLO outperformed state-of-the-art models, including YOLOv7, YOLOv8, YOLOv10, and YOLOv11, in terms of detection accuracy and efficiency. In particular, RFAG-YOLO achieved an mAP50 of 38.9%, representing substantial improvements over multiple baseline models: a 12.43% increase over YOLOv7, a 5.99% improvement over YOLOv10, and significant gains of 16.12% compared to YOLOv8n and YOLOv11. Moreover, compared to the larger YOLOv8s model, RFAG-YOLO achieved 97.98% of its mAP50 performance while utilizing only 53.51% of the parameters, highlighting its exceptional efficiency in terms of the performance-to-parameter ratio and making it highly suitable for resource-constrained UAV applications. These results underscore the substantial potential of RFAG-YOLO for real-world UAV applications, particularly in scenarios demanding accurate detection of small objects under challenging conditions such as varying lighting, complex backgrounds, and diverse scales.

摘要

YOLO系列目标检测方法因其高效性和准确性,在广泛的计算机视觉任务中取得了显著成功。然而,由于无人机图像分辨率低、背景干扰复杂以及尺度变化大等因素,检测其中的小目标仍然是一项艰巨的挑战,这些因素共同降低了特征提取的质量并限制了检测性能。为应对这些挑战,我们提出了感受野注意力引导的YOLO(RFAG-YOLO)方法,这是一种针对无人机图像中小目标检测对YOLOv8进行的先进改进,重点在于提高特征表示和检测鲁棒性。为此,我们引入了一种新颖的网络构建块,称为感受野网络块(RFN块),它利用动态内核参数调整来增强模型捕捉细粒度局部细节的能力。为有效利用多尺度特征,我们设计了一种基于RFN块的增强型FasterNet模块,作为RFAG-YOLO主干网络的核心组件,能够在不同分辨率下进行稳健的特征提取。这种方法通过采用分阶段下采样和RFN块的分层排列实现了语义信息的平衡,确保在不同分辨率下都能实现最佳特征表示。此外,我们在RFAG-YOLO的检测头之前引入了尺度感知特征融合(SAF)组件。该组件采用尺度注意力机制对来自较高层和较低层的特征进行动态加权,促进更丰富的信息流,并显著提高模型对复杂背景和尺度变化的鲁棒性。在VisDrone2019数据集上的实验结果表明,RFAG-YOLO在检测精度和效率方面优于包括YOLOv7、YOLOv8、YOLOv10和YOLOv11在内的现有最先进模型。特别是,RFAG-YOLO实现了38.9%的mAP50,相对于多个基线模型有显著提升:比YOLOv7提高了12.43%,比YOLOv10提高了5.99%,与YOLOv8n和YOLOv11相比分别有16.12%的显著提升。此外,与更大的YOLOv8s模型相比,RFAG-YOLO在仅使用53.51%参数的情况下实现了其mAP50性能的97.98%,突出了其在性能与参数比方面的卓越效率,使其非常适合资源受限的无人机应用。这些结果强调了RFAG-YOLO在实际无人机应用中的巨大潜力,特别是在诸如光照变化、复杂背景和多样尺度等具有挑战性条件下需要精确检测小目标的场景中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b72/11991089/e0062f30bf44/sensors-25-02193-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验