RDCRNet：基于跨模态表示模型的RGB-T目标检测网络

RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model.

作者信息

Li Yubin, Zhan Weida, Jiang Yichun, Guo Jinxin

机构信息

The College of Electronic and Information Engineering, Changchun University of Science and Technology, Changchun 130022, China.

出版信息

Entropy (Basel). 2025 Apr 19;27(4):442. doi: 10.3390/e27040442.

DOI:10.3390/e27040442

PMID:40282677

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12027132/

Abstract

RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely registered data and insufficient handling of cross-modal distribution disparities. This paper presents RDCRNet, a novel framework incorporating a Cross-Modal Representation Model to effectively address these challenges. The proposed network features a Cross-Modal Feature Remapping Module that aligns modality distributions through statistical normalization and learnable correction parameters, significantly reducing feature discrepancies between modalities. A Cross-Modal Refinement and Interaction Module enables sophisticated bidirectional information exchange via trinity refinement for intra-modal context modeling and cross-attention mechanisms for unaligned feature fusion. Multiscale detection capability is enhanced through a Cross-Scale Feature Integration Module, improving detection performance across various object sizes. To overcome the inherent data scarcity in RGB-T detection, we introduce a self-supervised pretraining strategy that combines masked reconstruction with adversarial learning and semantic consistency loss, effectively leveraging both aligned and unaligned RGB-T samples. Extensive experiments demonstrate that RDCRNet achieves state-of-the-art performance on multiple benchmark datasets while maintaining high computational and storage efficiency, validating its superiority and practical effectiveness in real-world applications.

摘要

RGB-热目标检测利用可见光和热成像模态的互补信息，以增强在具有挑战性的环境中的检测鲁棒性，特别是在低光照条件下。然而，现有方法由于严重依赖精确配准的数据以及对跨模态分布差异处理不足而存在局限性。本文提出了RDCRNet，这是一个结合跨模态表示模型的新颖框架，以有效应对这些挑战。所提出的网络具有一个跨模态特征重映射模块，该模块通过统计归一化和可学习的校正参数来对齐模态分布，显著减少模态之间的特征差异。一个跨模态细化与交互模块通过用于模态内上下文建模的三位一体细化和用于未对齐特征融合的交叉注意力机制实现复杂的双向信息交换。通过跨尺度特征集成模块增强了多尺度检测能力，提高了对各种物体大小的检测性能。为了克服RGB-T检测中固有的数据稀缺问题，我们引入了一种自监督预训练策略，该策略将掩码重建与对抗学习和语义一致性损失相结合，有效地利用了对齐和未对齐的RGB-T样本。大量实验表明，RDCRNet在多个基准数据集上实现了领先的性能，同时保持了高计算和存储效率，验证了其在实际应用中的优越性和实际有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fa20/12027132/4ab3dacf0b9a/entropy-27-00442-g001.jpg

相似文献

RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model.RDCRNet：基于跨模态表示模型的RGB-T目标检测网络

Entropy (Basel). 2025 Apr 19;27(4):442. doi: 10.3390/e27040442.

Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection.用于RGB-T显著目标检测的轻量级跨模态信息相互增强网络

Entropy (Basel). 2024 Jan 31;26(2):130. doi: 10.3390/e26020130.

Cross-Modal Object Tracking via Modality-Aware Fusion Network and a Large-Scale Dataset.通过模态感知融合网络和大规模数据集实现跨模态目标跟踪

IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6981-6994. doi: 10.1109/TNNLS.2024.3406189. Epub 2025 Apr 8.

Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning.基于模态感知注意力网络和竞争学习的 RGB-T 视频目标跟踪

Sensors (Basel). 2020 Jan 10;20(2):393. doi: 10.3390/s20020393.

Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection.用于RGB-D显著目标检测的全局引导跨模态跨尺度网络

Sensors (Basel). 2023 Aug 17;23(16):7221. doi: 10.3390/s23167221.

TCAINet an RGB T salient object detection model with cross modal fusion and adaptive decoding.TCAINet：一种具有跨模态融合和自适应解码的RGB-T显著目标检测模型。

Sci Rep. 2025 Apr 24;15(1):14266. doi: 10.1038/s41598-025-98423-z.

Cross-Modal Attentional Context Learning for RGB-D Object Detection.跨模态注意上下文学习的 RGB-D 目标检测。

IEEE Trans Image Process. 2019 Apr;28(4):1591-1601. doi: 10.1109/TIP.2018.2878956. Epub 2018 Oct 31.

Middle-Level Feature Fusion for Lightweight RGB-D Salient Object Detection.用于轻量级RGB-D显著目标检测的中级特征融合

IEEE Trans Image Process. 2022;31:6621-6634. doi: 10.1109/TIP.2022.3214092. Epub 2022 Oct 26.

Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection.分而治之：用于RGB-T显著目标检测的融合三流网络

IEEE Trans Pattern Anal Mach Intell. 2024 Dec 5;PP. doi: 10.1109/TPAMI.2024.3511621.

Mitigating Modality Discrepancies for RGB-T Semantic Segmentation.减轻RGB-T语义分割中的模态差异

IEEE Trans Neural Netw Learn Syst. 2024 Jul;35(7):9380-9394. doi: 10.1109/TNNLS.2022.3233089. Epub 2024 Jul 8.

本文引用的文献

Sparse R-CNN: An End-to-End Framework for Object Detection.稀疏区域卷积神经网络（Sparse R-CNN）：一种用于目标检测的端到端框架。

IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):15650-15664. doi: 10.1109/TPAMI.2023.3292030. Epub 2023 Nov 3.

LRAF-Net: Long-Range Attention Fusion Network for Visible-Infrared Object Detection.LRAF-Net：用于可见光-红外目标检测的远程注意力融合网络

IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):13232-13245. doi: 10.1109/TNNLS.2023.3266452. Epub 2024 Oct 7.

Visible and Infrared Image Fusion Using Deep Learning.基于深度学习的可见光与红外图像融合

IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):10535-10554. doi: 10.1109/TPAMI.2023.3261282. Epub 2023 Jun 30.

GMNet: Graded-Feature Multilabel-Learning Network for RGB-Thermal Urban Scene Semantic Segmentation.GMNet：用于RGB-热红外城市场景语义分割的分级特征多标签学习网络

IEEE Trans Image Process. 2021;30:7790-7802. doi: 10.1109/TIP.2021.3109518. Epub 2021 Sep 14.

A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects.卷积神经网络综述：分析、应用与展望

IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):6999-7019. doi: 10.1109/TNNLS.2021.3084827. Epub 2022 Nov 30.

FCOS: A Simple and Strong Anchor-Free Object Detector.FCOS：一种简单且强大的无锚框目标检测器。

IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):1922-1933. doi: 10.1109/TPAMI.2020.3032166. Epub 2022 Mar 4.

RGB-T Salient Object Detection via Fusing Multi-level CNN Features.基于融合多级卷积神经网络特征的RGB-T显著目标检测

IEEE Trans Image Process. 2019 Dec 17. doi: 10.1109/TIP.2019.2959253.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN：基于区域建议网络的实时目标检测。

IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

RDCRNet：基于跨模态表示模型的RGB-T目标检测网络

RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献