Tian Dan, Yan Xin, Zhou Dong, Wang Chen, Zhang Wenshuai
Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
Sensors (Basel). 2024 Sep 24;24(19):6181. doi: 10.3390/s24196181.
With the rapid growth in demand for security surveillance, assisted driving, and remote sensing, object detection networks with robust environmental perception and high detection accuracy have become a research focus. However, single-modality image detection technologies face limitations in environmental adaptability, often affected by factors such as lighting conditions, fog, rain, and obstacles like vegetation, leading to information loss and reduced detection accuracy. We propose an object detection network that integrates features from visible light and infrared images-IV-YOLO-to address these challenges. This network is based on YOLOv8 (You Only Look Once v8) and employs a dual-branch fusion structure that leverages the complementary features of infrared and visible light images for target detection. We designed a Bidirectional Pyramid Feature Fusion structure (Bi-Fusion) to effectively integrate multimodal features, reducing errors from feature redundancy and extracting fine-grained features for small object detection. Additionally, we developed a Shuffle-SPP structure that combines channel and spatial attention to enhance the focus on deep features and extract richer information through upsampling. Regarding model optimization, we designed a loss function tailored for multi-scale object detection, accelerating the convergence speed of the network during training. Compared with the current state-of-the-art Dual-YOLO model, IV-YOLO achieves mAP improvements of 2.8%, 1.1%, and 2.2% on the Drone Vehicle, FLIR, and KAIST datasets, respectively. On the Drone Vehicle and FLIR datasets, IV-YOLO has a parameter count of 4.31 M and achieves a frame rate of 203.2 fps, significantly outperforming YOLOv8n (5.92 M parameters, 188.6 fps on the Drone Vehicle dataset) and YOLO-FIR (7.1 M parameters, 83.3 fps on the FLIR dataset), which had previously achieved the best performance on these datasets. This demonstrates that IV-YOLO achieves higher real-time detection performance while maintaining lower parameter complexity, making it highly promising for applications in autonomous driving, public safety, and beyond.
随着安全监控、辅助驾驶和遥感需求的快速增长,具有强大环境感知能力和高检测精度的目标检测网络已成为研究热点。然而,单模态图像检测技术在环境适应性方面存在局限性,常受光照条件、雾、雨以及植被等障碍物等因素影响,导致信息丢失和检测精度降低。我们提出了一种集成可见光和红外图像特征的目标检测网络——IV - YOLO,以应对这些挑战。该网络基于YOLOv8(You Only Look Once v8),采用双分支融合结构,利用红外和可见光图像的互补特征进行目标检测。我们设计了一种双向金字塔特征融合结构(Bi - Fusion),以有效整合多模态特征,减少特征冗余带来的误差,并为小目标检测提取细粒度特征。此外,我们还开发了一种Shuffle - SPP结构,它结合了通道和空间注意力,以增强对深层特征的关注,并通过上采样提取更丰富的信息。在模型优化方面,我们设计了一种针对多尺度目标检测量身定制的损失函数,在训练过程中加快网络的收敛速度。与当前最先进的Dual - YOLO模型相比,IV - YOLO在无人机车辆、FLIR和KAIST数据集上的平均精度均值(mAP)分别提高了2.8%、1.1%和2.2%。在无人机车辆和FLIR数据集上,IV - YOLO的参数数量为431万个,帧率为203.2帧/秒,显著优于YOLOv8n(在无人机车辆数据集上有592万个参数,188.6帧/秒)和YOLO - FIR(在FLIR数据集上有7,10万个参数,83.3帧/秒),这两个模型此前在这些数据集上取得了最佳性能。这表明IV - YOLO在保持较低参数复杂度的同时实现了更高的实时检测性能,使其在自动驾驶、公共安全等领域及其他应用中极具前景。