Yang Linli, Honarvar Shakibaei Asli Barmak
College of Mechanical and Electrical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China.
Faculty of Engineering and Applied Sciences, Cranfield University, Cranfield, Bedford MK43 0AL, UK.
J Imaging. 2025 Aug 21;11(8):285. doi: 10.3390/jimaging11080285.
Small object detection in UAV aerial imagery presents significant challenges due to scale variations, sparse feature representation, and complex backgrounds. To address these issues, this paper focuses on practical engineering improvements to the existing YOLOv8s framework, rather than proposing a fundamentally new algorithm. We introduce MultiScaleConv-YOLO (MSConv-YOLO), an enhanced model that integrates well-established techniques to improve detection performance for small targets. Specifically, the proposed approach introduces three key improvements: (1) a MultiScaleConv (MSConv) module that combines depthwise separable and dilated convolutions with varying dilation rates, enhancing multi-scale feature extraction while maintaining efficiency; (2) the replacement of CIoU with WIoU v3 as the bounding box regression loss, which incorporates a dynamic non-monotonic focusing mechanism to improve localization for small targets; and (3) the addition of a high-resolution detection head in the neck-head structure, leveraging FPN and PAN to preserve fine-grained features and ensure full-scale coverage. Experimental results on the VisDrone2019 dataset show that MSConv-YOLO outperforms the baseline YOLOv8s by achieving a 6.9% improvement in mAP@0.5 and a 6.3% gain in recall. Ablation studies further validate the complementary impact of each enhancement. This paper presents practical and effective engineering enhancements to small object detection in UAV scenarios, offering an improved solution without introducing entirely new theoretical constructs. Future work will focus on lightweight deployment and adaptation to more complex environments.
由于尺度变化、稀疏特征表示和复杂背景,无人机航空图像中的小目标检测面临重大挑战。为了解决这些问题,本文重点对现有的YOLOv8s框架进行实际工程改进,而不是提出一种全新的算法。我们引入了多尺度卷积YOLO(MSConv-YOLO),这是一种增强模型,它集成了成熟的技术来提高小目标的检测性能。具体而言,所提出的方法引入了三项关键改进:(1)一个多尺度卷积(MSConv)模块,该模块将深度可分离卷积和空洞卷积与不同的扩张率相结合,在保持效率的同时增强多尺度特征提取;(2)用WIoU v3取代CIoU作为边界框回归损失,WIoU v3包含一个动态非单调聚焦机制,以改善小目标的定位;(3)在颈部-头部结构中添加一个高分辨率检测头,利用特征金字塔网络(FPN)和路径聚合网络(PAN)来保留细粒度特征并确保全尺度覆盖。在VisDrone2019数据集上的实验结果表明,MSConv-YOLO在mAP@0.5上提高了6.9%,召回率提高了6.3%,优于基线YOLOv8s。消融研究进一步验证了每种增强的互补作用。本文提出了无人机场景下小目标检测的实用有效工程增强方法,提供了一种改进的解决方案,而无需引入全新的理论架构。未来的工作将集中在轻量级部署和适应更复杂的环境上。