Wu Yiming, Mu Xiaofang, Shi Hong, Hou Mingxing
School of Computer Science and Technology, Taiyuan Normal University, Taiyuan, 030000, China.
Shanxi Institute of Energy, Taiyuan, 030000, China.
Sci Rep. 2025 May 9;15(1):16214. doi: 10.1038/s41598-025-00239-4.
In small object detection scenarios such as UAV aerial imagery and remote sensing, the difficulties in feature extraction are primarily due to challenges such as small object size, multi-scale variations, and background interference. To overcome these challenges, this paper presents a model for detecting small objects, AAPW-YOLO, based on adaptive convolution and reconstructed feature fusion. In the AAPW-YOLO model, we improve the standard convolution and the CSP Bottleneck with 2 Convolutions (C2f) structure in the You Only Look Once v8 (YOLOv8) backbone network by using Alterable Kernel Convolution (AKConv), which improves the network's proficiency in capturing features across various scales while considerably lowering the model's parameter count. Additionally, we introduce the Attentional Scale Sequence Fusion P2 (ASFP2) structure, which enhances the feature fusion mechanism of the Attentional Scale Sequence Fusion You Only Look Once (ASF-YOLO) and incorporates a P2 detection layer. This optimizes the feature fusion mechanism in the YOLOv8 neck, enhancing the network's ability to capture both fine details and global contextual information, while additionally decreasing the model parameters. Finally, we adopt a gradient-enhancing strategy with the Wise Intersection over Union (Wise-IoU) loss function to balance the gradient contributions from anchor boxes of different qualities during training, thereby improving regression accuracy. Experimental results show that: The proposed detection model reduces the parameter count by 30% and improves mAP@0.5 by 3.6% on the VisDrone2019 dataset; On the DOTA v1.0 dataset, the parameter count is reduced by 30%, with a 2.5% improvement in mAP@0.5. The proposed model achieves high recognition accuracy while having fewer parameters, enhancing the robustness and generalization ability of the network.
在无人机航空影像和遥感等小目标检测场景中,特征提取的困难主要源于小目标尺寸、多尺度变化和背景干扰等挑战。为了克服这些挑战,本文提出了一种基于自适应卷积和重构特征融合的小目标检测模型AAPW-YOLO。在AAPW-YOLO模型中,我们通过使用可变内核卷积(AKConv)改进了You Only Look Once v8(YOLOv8)主干网络中的标准卷积和带有2个卷积的CSP瓶颈(C2f)结构,这提高了网络跨不同尺度捕获特征的能力,同时大幅降低了模型的参数数量。此外,我们引入了注意力尺度序列融合P2(ASFP2)结构,该结构增强了注意力尺度序列融合You Only Look Once(ASF-YOLO)的特征融合机制,并纳入了一个P2检测层。这优化了YOLOv8颈部的特征融合机制,增强了网络捕获精细细节和全局上下文信息的能力,同时还减少了模型参数。最后,我们采用一种带有明智交并比(Wise-IoU)损失函数的梯度增强策略,以在训练期间平衡来自不同质量锚框的梯度贡献,从而提高回归精度。实验结果表明:所提出的检测模型在VisDrone2019数据集上参数数量减少了30%,mAP@0.5提高了3.6%;在DOTA v1.0数据集上,参数数量减少了30%,mAP@0.5提高了2.5%。所提出的模型在参数较少的情况下实现了高识别精度,增强了网络的鲁棒性和泛化能力。