Jiang Yu, Wang Yuehang, Zhao Minghao, Zhang Yongji, Qi Hong
The College of Computer Science and Technology, Jilin University, Changchun 130012, China.
The Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
Fundam Res. 2023 Oct 10;5(4):1633-1644. doi: 10.1016/j.fmre.2023.08.004. eCollection 2025 Jul.
Intelligent perception is crucial in Intelligent Transportation Systems (ITS), with vision cameras as critical components. However, traditional RGB cameras exhibit a significant decline in performance when capturing nighttime traffic scenes, limiting their effectiveness in supporting ITS. In contrast, event cameras possess a high dynamic range (140 dB vs. 60 dB for traditional cameras), enabling them to overcome frame degradation in low-light conditions. Recently, multimodal learning paradigms have made substantial progress in various vision tasks, such as image-text retrieval. Motivated by this progress, we propose an adaptive selection and fusion detection method that leverages both event and RGB frame domains to optimize nighttime traffic object detection jointly. To address the challenge of unbalanced multimodal data fusion, we design a learnable adaptive selection and fusion module. This module performs feature ranking and fusion in the channel dimension, allowing efficient multimodal fusion. Additionally, we construct a novel multi-level feature pyramid network based on multimodal attention fusion. This network extracts potential features to enhance robustness in detecting nighttime traffic objects. Furthermore, we curate a dataset for nighttime traffic scenarios comprising RGB frames and corresponding event streams. Through experiments, we demonstrate that our proposed method outperforms current state-of-the-art techniques in event-based, frame-based, and event and frame fusion methods. This highlights the effectiveness of integrating the event and frame domains in enhancing nighttime traffic object detection.
智能感知在智能交通系统(ITS)中至关重要,视觉摄像头是关键组件。然而,传统的RGB摄像头在捕捉夜间交通场景时性能会显著下降,限制了它们对智能交通系统的支持效果。相比之下,事件相机具有高动态范围(传统相机为60dB,事件相机为140dB),使其能够克服低光条件下的帧退化问题。最近,多模态学习范式在各种视觉任务(如图像-文本检索)中取得了重大进展。受此进展的启发,我们提出了一种自适应选择和融合检测方法,该方法利用事件和RGB帧域来联合优化夜间交通目标检测。为了解决多模态数据融合不平衡的挑战,我们设计了一个可学习的自适应选择和融合模块。该模块在通道维度上进行特征排序和融合,实现高效的多模态融合。此外,我们基于多模态注意力融合构建了一个新颖的多级特征金字塔网络。该网络提取潜在特征,以增强检测夜间交通目标的鲁棒性。此外,我们精心策划了一个夜间交通场景数据集,包括RGB帧和相应的事件流。通过实验,我们证明了我们提出的方法在基于事件、基于帧以及事件和帧融合方法方面优于当前的先进技术。这突出了整合事件和帧域在增强夜间交通目标检测方面的有效性。