TCAINet：一种具有跨模态融合和自适应解码的RGB-T显著目标检测模型。

TCAINet an RGB T salient object detection model with cross modal fusion and adaptive decoding.

作者信息

Peng Hong, Hu Yunfei, Yu Baocai, Zhang Zhen

机构信息

Ordos Institute of Liaoning Technical University, Ordos, China.

College of Faculty of Electronic and Information Engineering, Liaoning Technical University, Huludao, 125100, Liaoning, China.

出版信息

Sci Rep. 2025 Apr 24;15(1):14266. doi: 10.1038/s41598-025-98423-z.

DOI:10.1038/s41598-025-98423-z

PMID:40275036

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12022040/

Abstract

In the field of deep learning-based object detection, RGB-T salient object detection (SOD) networks show significant potential for cross-modal information fusion. However, existing methods still face considerable challenges in complex scenes. Specifically, current cross-modal feature fusion approaches fail to exploit the complementary information between modalities fully, resulting in limited robustness when handling diverse inputs. Furthermore, inadequate adaptation to multi-scale features hinders accurately recognizing salient objects at different scales. Although some feature decoding strategies attempt to mitigate noise interference, they often struggle in high-noise environments and lack flexible feature weighting, further restricting fusion capabilities. To address these limitations, this paper proposes a novel salient object detection network, TCAINet. The network integrates a Channel Attention (CA) mechanism, an enhanced cross-modal fusion module (CAF), and an adaptive decoder (AAD) to improve both the depth and breadth of feature fusion. Additionally, diverse noise addition and augmentation methods are applied during data preprocessing to boost the model's robustness and adaptability. Specifically, the CA module enhances the model's feature selection ability, while the CAF and AAD modules optimize the integration and processing of multimodal information. Experimental results demonstrate that TCAINet outperforms existing methods across multiple evaluation metrics, proving its effectiveness and practicality in complex scenes. Notably, the proposed model achieves improvements of 0.653%, 1.384%, 1.019%, and 5.83% in Sm, Em, Fm, and MAE metrics, respectively, confirming its efficacy in enhancing detection accuracy and optimizing feature fusion. The code and results can be found at the following link:huyunfei0219/TCAINet.

摘要

在基于深度学习的目标检测领域，基于RGB-T的显著目标检测（SOD）网络在跨模态信息融合方面展现出巨大潜力。然而，现有方法在复杂场景中仍面临诸多挑战。具体而言，当前的跨模态特征融合方法未能充分利用模态间的互补信息，导致在处理多样输入时鲁棒性有限。此外，对多尺度特征的适应性不足阻碍了在不同尺度下准确识别显著目标。尽管一些特征解码策略试图减轻噪声干扰，但它们在高噪声环境中往往效果不佳，且缺乏灵活的特征加权，进一步限制了融合能力。为解决这些局限性，本文提出了一种新颖的显著目标检测网络TCAINet。该网络集成了通道注意力（CA）机制、增强型跨模态融合模块（CAF）和自适应解码器（AAD），以提高特征融合的深度和广度。此外，在数据预处理过程中应用了多种噪声添加和增强方法，以提升模型的鲁棒性和适应性。具体来说，CA模块增强了模型的特征选择能力，而CAF和AAD模块优化了多模态信息的整合与处理。实验结果表明，TCAINet在多个评估指标上优于现有方法，证明了其在复杂场景中的有效性和实用性。值得注意的是，所提出的模型在Sm、Em、Fm和MAE指标上分别实现了0.653%、1.384%、1.019%和5.83%的提升，证实了其在提高检测精度和优化特征融合方面的功效。代码和结果可在以下链接找到：huyunfei0219/TCAINet。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac80/12022040/a1df0282b78e/41598_2025_98423_Fig1_HTML.jpg

相似文献

TCAINet an RGB T salient object detection model with cross modal fusion and adaptive decoding.

Sci Rep. 2025 Apr 24;15(1):14266. doi: 10.1038/s41598-025-98423-z.

SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection.

Sensors (Basel). 2024 Feb 8;24(4):1117. doi: 10.3390/s24041117.

Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection.

Entropy (Basel). 2024 Jan 31;26(2):130. doi: 10.3390/e26020130.

Edge-guided feature fusion network for RGB-T salient object detection.

Front Neurorobot. 2024 Dec 17;18:1489658. doi: 10.3389/fnbot.2024.1489658. eCollection 2024.

Wavelet-Driven Multi-Band Feature Fusion for RGB-T Salient Object Detection.

Sensors (Basel). 2024 Dec 20;24(24):8159. doi: 10.3390/s24248159.

Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection.

Sensors (Basel). 2023 Aug 17;23(16):7221. doi: 10.3390/s23167221.

UTDNet: A unified triplet decoder network for multimodal salient object detection.

Neural Netw. 2024 Feb;170:521-534. doi: 10.1016/j.neunet.2023.11.051. Epub 2023 Nov 24.

MSEDNet: Multi-scale fusion and edge-supervised network for RGB-T salient object detection.

Neural Netw. 2024 Mar;171:410-422. doi: 10.1016/j.neunet.2023.12.031. Epub 2023 Dec 19.

RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model.

Entropy (Basel). 2025 Apr 19;27(4):442. doi: 10.3390/e27040442.

3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond.

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4309-4323. doi: 10.1109/TNNLS.2022.3202241. Epub 2024 Feb 29.

本文引用的文献

MSEDNet: Multi-scale fusion and edge-supervised network for RGB-T salient object detection.

Neural Netw. 2024 Mar;171:410-422. doi: 10.1016/j.neunet.2023.12.031. Epub 2023 Dec 19.

A Dataset and Model for the Visual Quality Assessment of Inversely Tone-Mapped HDR Videos.

IEEE Trans Image Process. 2024;33:366-381. doi: 10.1109/TIP.2023.3343099. Epub 2023 Dec 27.

Video Captioning Using Global-Local Representation.

IEEE Trans Circuits Syst Video Technol. 2022 Oct;32(10):6642-6656. doi: 10.1109/tcsvt.2022.3177320. Epub 2022 May 23.

LSNet: Lightweight Spatial Boosting Network for Detecting Salient Objects in RGB-Thermal Images.

IEEE Trans Image Process. 2023;32:1329-1340. doi: 10.1109/TIP.2023.3242775. Epub 2023 Feb 27.

CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection.

IEEE Trans Image Process. 2022;31:6800-6815. doi: 10.1109/TIP.2022.3216198. Epub 2022 Oct 28.

Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM.

Neurocomputing (Amst). 2022 Apr 7;481:333-356. doi: 10.1016/j.neucom.2022.01.014. Epub 2022 Jan 21.

MobileSal: Extremely Efficient RGB-D Salient Object Detection.

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):10261-10269. doi: 10.1109/TPAMI.2021.3134684. Epub 2022 Nov 7.

RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss.

IEEE Trans Image Process. 2021;30:5613-5625. doi: 10.1109/TIP.2021.3087341. Epub 2021 Jun 18.

Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection.

IEEE Trans Image Process. 2021;30:3528-3542. doi: 10.1109/TIP.2021.3062689. Epub 2021 Mar 11.

AMP-Net: Denoising-Based Deep Unfolding for Compressive Image Sensing.

IEEE Trans Image Process. 2021;30:1487-1500. doi: 10.1109/TIP.2020.3044472. Epub 2020 Dec 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

TCAINet：一种具有跨模态融合和自适应解码的RGB-T显著目标检测模型。

TCAINet an RGB T salient object detection model with cross modal fusion and adaptive decoding.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献