Khalili Boshra, Smyth Andrew W
Department of Civil Engineering and Engineering Mechanics, Columbia University, New York, NY 10027, USA.
Sensors (Basel). 2024 Sep 25;24(19):6209. doi: 10.3390/s24196209.
Object detection, as a crucial aspect of computer vision, plays a vital role in traffic management, emergency response, autonomous vehicles, and smart cities. Despite the significant advancements in object detection, detecting small objects in images captured by high-altitude cameras remains challenging, due to factors such as object size, distance from the camera, varied shapes, and cluttered backgrounds. To address these challenges, we propose small object detection YOLOv8 (SOD-YOLOv8), a novel model specifically designed for scenarios involving numerous small objects. Inspired by efficient generalized feature pyramid networks (GFPNs), we enhance multi-path fusion within YOLOv8 to integrate features across different levels, preserving details from shallower layers and improving small object detection accuracy. Additionally, we introduce a fourth detection layer to effectively utilize high-resolution spatial information. The efficient multi-scale attention module (EMA) in the C2f-EMA module further enhances feature extraction by redistributing weights and prioritizing relevant features. We introduce powerful-IoU (PIoU) as a replacement for CIoU, focusing on moderate quality anchor boxes and adding a penalty based on differences between predicted and ground truth bounding box corners. This approach simplifies calculations, speeds up convergence, and enhances detection accuracy. SOD-YOLOv8 significantly improves small object detection, surpassing widely used models across various metrics, without substantially increasing the computational cost or latency compared to YOLOv8s. Specifically, it increased recall from 40.1% to 43.9%, precision from 51.2% to 53.9%, mAP from 40.6% to 45.1%, and mAP from 24% to 26.6%. Furthermore, experiments conducted in dynamic real-world traffic scenes illustrated SOD-YOLOv8's significant enhancements across diverse environmental conditions, highlighting its reliability and effective object detection capabilities in challenging scenarios.
目标检测作为计算机视觉的一个关键方面,在交通管理、应急响应、自动驾驶车辆和智慧城市中发挥着至关重要的作用。尽管目标检测取得了重大进展,但由于物体大小、与相机的距离、形状各异以及背景杂乱等因素,在高空相机拍摄的图像中检测小物体仍然具有挑战性。为应对这些挑战,我们提出了小目标检测YOLOv8(SOD - YOLOv8),这是一种专门为涉及众多小物体的场景设计的新型模型。受高效通用特征金字塔网络(GFPN)的启发,我们增强了YOLOv8中的多路径融合,以整合不同层次的特征,保留浅层的细节并提高小目标检测精度。此外,我们引入了第四个检测层,以有效利用高分辨率空间信息。C2f - EMA模块中的高效多尺度注意力模块(EMA)通过重新分配权重和优先考虑相关特征,进一步增强了特征提取。我们引入强大交并比(PIoU)来替代CIoU,专注于中等质量的锚框,并根据预测边界框和真实边界框角点之间的差异添加惩罚项。这种方法简化了计算,加快了收敛速度,并提高了检测精度。SOD - YOLOv8显著提高了小目标检测性能,在各种指标上超过了广泛使用的模型,与YOLOv8s相比,计算成本或延迟没有大幅增加。具体而言,召回率从40.1%提高到43.9%,精度从51.2%提高到53.9%,mAP从40.6%提高到45.1%,以及从24%提高到26.6%。此外,在动态真实世界交通场景中进行的实验表明,SOD - YOLOv8在各种环境条件下都有显著增强,突出了其在具有挑战性场景中的可靠性和有效的目标检测能力。