CFRNet：基于交叉注意力的融合与细化网络，用于增强RGB-T显著目标检测

CFRNet: Cross-Attention-Based Fusion and Refinement Network for Enhanced RGB-T Salient Object Detection.

作者信息

Deng Biao, Liu Di, Cao Yang, Liu Hong, Yan Zhiguo, Chen Hu

机构信息

Dongfang Electric Autocontrol Engineering Co., Ltd., Deyang 618000, China.

College of Computer Science, Sichuan University, Chengdu 610000, China.

出版信息

Sensors (Basel). 2024 Nov 7;24(22):7146. doi: 10.3390/s24227146.

DOI:10.3390/s24227146

PMID:39598924

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11598186/

Abstract

Existing deep learning-based RGB-T salient object detection methods often struggle with effectively fusing RGB and thermal features. Therefore, obtaining high-quality features and fully integrating these two modalities are central research focuses. We developed an illumination prior-based coefficient predictor (MICP) to determine optimal interaction weights. We then designed a saliency-guided encoder (SG Encoder) to extract multi-scale thermal features incorporating saliency information. The SG Encoder guides the extraction of thermal features by leveraging their correlation with RGB features, particularly those with strong semantic relationships to salient object detection tasks. Finally, we employed a Cross-attention-based Fusion and Refinement Module (CrossFRM) to refine the fused features. The robust thermal features help refine the spatial focus of the fused features, aligning them more closely with salient objects. Experimental results demonstrate that our proposed approach can more accurately locate salient objects, significantly improving performance compared to 11 state-of-the-art methods.

摘要

现有的基于深度学习的RGB-T显著目标检测方法在有效融合RGB和热成像特征方面常常面临困难。因此，获取高质量特征并充分整合这两种模态是核心研究重点。我们开发了一种基于光照先验的系数预测器（MICP）来确定最优交互权重。然后，我们设计了一个显著度引导编码器（SG Encoder）来提取包含显著度信息的多尺度热成像特征。SG Encoder通过利用热成像特征与RGB特征的相关性，特别是那些与显著目标检测任务具有强语义关系的特征，来引导热成像特征的提取。最后，我们采用了基于交叉注意力的融合与细化模块（CrossFRM）来细化融合后的特征。强大的热成像特征有助于细化融合特征的空间焦点，使其与显著目标更紧密地对齐。实验结果表明，我们提出的方法能够更准确地定位显著目标，与11种先进方法相比，性能有显著提升。