Wang Jingyang, Gao Jiayao, Zhang Bo
School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China.
School of Cyberspace Security, Hebei University of Engineering Science, Shijiazhuang, 050091, China.
Sci Rep. 2025 Jan 4;15(1):770. doi: 10.1038/s41598-024-84938-4.
Aerial images can cover a wide area and capture rich scene information. These images are often taken from a high altitude and contain many small objects. It is difficult to detect small objects accurately because their features are not obvious and are susceptible to background interference. The CPDD-YOLOv8 is proposed to improve the performance of small object detection. Firstly, we propose the C2fGAM structure, which integrates the Global Attention Mechanism (GAM) into the C2f structure of the backbone so that the model can better understand the overall semantics of the images. Secondly, a detection layer named P2 is added to extract the shallow features. Thirdly, a new DSC2f structure is proposed, which uses Dynamic Snake Convolution (DSConv) to take the place of the first standard Conv of Bottleneck in the C2f structure, so that the model can adapt to different inputs more effectively. Finally, the Dynamic Head (DyHead), which integrates multiple attention mechanisms, is used in the head to assign different weights to different feature layers. To prove the effectiveness of the CPDD-YOLOv8, we carry out ablation and comparison experiments on the VisDrone2019 dataset. Ablation experiments show that all the improved and added modules in CPDD-YOLOv8 are effective. Comparative experiments suggest that the mAP of CPDD-YOLOv8 is higher than the other seven comparison models. The mAP@0.5 of this model reaches 41%, which is 6.9% higher than that of YOLOv8. The CPDD-YOLOv8's small object detection rate is improved by 13.1%. The generalizability of the CPDD-YOLOv8 model is verified on the WiderPerson, VOC_MASK and SHWD datasets.
航空图像可以覆盖广阔区域并捕捉丰富的场景信息。这些图像通常是从高空拍摄的,包含许多小物体。由于小物体的特征不明显且易受背景干扰,因此很难准确检测到它们。提出CPDD-YOLOv8以提高小物体检测的性能。首先,我们提出了C2fGAM结构,将全局注意力机制(GAM)集成到主干的C2f结构中,以便模型能够更好地理解图像的整体语义。其次,添加了一个名为P2的检测层来提取浅层特征。第三,提出了一种新的DSC2f结构,它使用动态蛇形卷积(DSConv)代替C2f结构中瓶颈的第一个标准卷积,从而使模型能够更有效地适应不同的输入。最后,在头部使用集成了多种注意力机制的动态头部(DyHead),为不同的特征层分配不同的权重。为了证明CPDD-YOLOv8的有效性,我们在VisDrone2019数据集上进行了消融和对比实验。消融实验表明,CPDD-YOLOv8中所有改进和添加的模块都是有效的。对比实验表明,CPDD-YOLOv8的平均精度均值(mAP)高于其他七个对比模型。该模型的mAP@0.5达到41%,比YOLOv8高6.9%。CPDD-YOLOv8的小物体检测率提高了13.1%。在WiderPerson、VOC_MASK和SHWD数据集上验证了CPDD-YOLOv8模型的通用性。