Li Yubin, Zhan Weida, Jiang Yichun, Guo Jinxin
The College of Electronic and Information Engineering, Changchun University of Science and Technology, Changchun 130022, China.
Entropy (Basel). 2025 Apr 19;27(4):442. doi: 10.3390/e27040442.
RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely registered data and insufficient handling of cross-modal distribution disparities. This paper presents RDCRNet, a novel framework incorporating a Cross-Modal Representation Model to effectively address these challenges. The proposed network features a Cross-Modal Feature Remapping Module that aligns modality distributions through statistical normalization and learnable correction parameters, significantly reducing feature discrepancies between modalities. A Cross-Modal Refinement and Interaction Module enables sophisticated bidirectional information exchange via trinity refinement for intra-modal context modeling and cross-attention mechanisms for unaligned feature fusion. Multiscale detection capability is enhanced through a Cross-Scale Feature Integration Module, improving detection performance across various object sizes. To overcome the inherent data scarcity in RGB-T detection, we introduce a self-supervised pretraining strategy that combines masked reconstruction with adversarial learning and semantic consistency loss, effectively leveraging both aligned and unaligned RGB-T samples. Extensive experiments demonstrate that RDCRNet achieves state-of-the-art performance on multiple benchmark datasets while maintaining high computational and storage efficiency, validating its superiority and practical effectiveness in real-world applications.
RGB-热目标检测利用可见光和热成像模态的互补信息,以增强在具有挑战性的环境中的检测鲁棒性,特别是在低光照条件下。然而,现有方法由于严重依赖精确配准的数据以及对跨模态分布差异处理不足而存在局限性。本文提出了RDCRNet,这是一个结合跨模态表示模型的新颖框架,以有效应对这些挑战。所提出的网络具有一个跨模态特征重映射模块,该模块通过统计归一化和可学习的校正参数来对齐模态分布,显著减少模态之间的特征差异。一个跨模态细化与交互模块通过用于模态内上下文建模的三位一体细化和用于未对齐特征融合的交叉注意力机制实现复杂的双向信息交换。通过跨尺度特征集成模块增强了多尺度检测能力,提高了对各种物体大小的检测性能。为了克服RGB-T检测中固有的数据稀缺问题,我们引入了一种自监督预训练策略,该策略将掩码重建与对抗学习和语义一致性损失相结合,有效地利用了对齐和未对齐的RGB-T样本。大量实验表明,RDCRNet在多个基准数据集上实现了领先的性能,同时保持了高计算和存储效率,验证了其在实际应用中的优越性和实际有效性。