Wei Jiangshu, Liu Gang, Liu Siqi, Xiao Zeyan
College of Information Engineering, Sichuan Agricultural University, Ya'an, Sichuan, China.
PeerJ Comput Sci. 2023 Mar 22;9:e1314. doi: 10.7717/peerj-cs.1314. eCollection 2023.
Small object detection is one of the difficulties in the development of computer vision, especially in the case of complex image backgrounds, and the accuracy of small object detection still needs to be improved. In this article, we present a small object detection network based on YOLOv4, which solves some obstacles that hinder the performance of traditional methods in small object detection tasks in complex road environments, such as few effective features, the influence of image noise, and occlusion by large objects, and improves the detection of small objects in complex background situations such as drone aerial survey images. The improved network architecture reduces the computation and GPU memory consumption of the network by including the cross-stage partial network (CSPNet) structure into the spatial pyramid pool (SPP) structure in the YOLOv4 network and convolutional layers after concatenation operation. Secondly, the accuracy of the model on the small object detection task is improved by adding a more suitable small object detection head and removing one used for large object detection. Then, a new branch is added to extract feature information at a shallow location in the backbone part, and the feature information extracted from this branch is fused in the neck part to enrich the small object location information extracted by the model; when fusing feature information from different levels in the backbone, the fusion weight of useful information is increased by adding a weighting mechanism to improve detection performance at each scale. Finally, a coordinated attention (CA) module is embedded at a suitable location in the neck part, which enables the model to focus on spatial location relationships and inter-channel relationships and enhances feature representation capability. The proposed model has been tested to detect 10 different target objects in aerial images from drones and five different road traffic signal signs in images taken from vehicles in a complex road environment. The detection speed of the model meets the criteria of real-time detection, the model has better performance in terms of accuracy compared to the existing state-of-the-art detection models, and the model has only 44M parameters. On the drone aerial photography dataset, the average accuracy of YOLOv4 and YOLOv5L is 42.79% and 42.10%, respectively, while our model achieves an average accuracy (mAP) of 52.76%; on the urban road traffic light dataset, the proposed model achieves an average accuracy of 96.98%, which is also better than YOLOv4 (95.32%), YOLOv5L (94.79%) and other advanced models. The current work provides an efficient method for small object detection in complex road environments, which can be extended to scenarios involving small object detection, such as drone cruising and autonomous driving.
小目标检测是计算机视觉发展中的难题之一,尤其是在图像背景复杂的情况下,小目标检测的准确率仍有待提高。在本文中,我们提出了一种基于YOLOv4的小目标检测网络,该网络解决了一些阻碍传统方法在复杂道路环境中进行小目标检测任务性能的障碍,如有效特征少、图像噪声的影响以及大物体的遮挡等问题,并提高了在无人机航测图像等复杂背景情况下小目标的检测能力。改进后的网络架构通过将跨阶段局部网络(CSPNet)结构融入YOLOv4网络的空间金字塔池(SPP)结构以及拼接操作后的卷积层,减少了网络的计算量和GPU内存消耗。其次,通过添加更适合的小目标检测头并移除一个用于大目标检测的头,提高了模型在小目标检测任务上的准确率。然后,在主干部分的浅层位置添加一个新分支来提取特征信息,并将从该分支提取的特征信息在颈部进行融合,以丰富模型提取的小目标位置信息;在融合主干中不同层次的特征信息时,通过添加加权机制增加有用信息的融合权重,以提高各尺度下的检测性能。最后,在颈部的合适位置嵌入一个协同注意力(CA)模块,使模型能够关注空间位置关系和通道间关系,增强特征表示能力。所提出的模型已在无人机航拍图像中的10种不同目标物体以及复杂道路环境中车辆拍摄图像中的5种不同道路交通信号标志上进行了检测测试。该模型的检测速度满足实时检测标准,与现有的先进检测模型相比,在准确率方面具有更好的性能,且模型仅有44M参数。在无人机航拍数据集上,YOLOv4和YOLOv5L的平均准确率分别为42.79%和42.10%,而我们的模型平均准确率(mAP)达到了52.76%;在城市道路交通信号灯数据集上,所提出的模型平均准确率达到了96.98%,也优于YOLOv4(95.32%)、YOLOv5L(94.79%)和其他先进模型。当前工作为复杂道路环境中的小目标检测提供了一种高效方法,可扩展到涉及小目标检测的场景,如无人机巡航和自动驾驶。