• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TwinsTNet:用于双模态显著目标检测的宽视角孪生变压器网络

TwinsTNet: Broad-View Twins Transformer Network for Bi-Modal Salient Object Detection.

作者信息

Lyu Pengfei, Yu Xiaosheng, Chi Jianning, Wu Hao, Wu Chengdong, Rajapakse Jagath C

出版信息

IEEE Trans Image Process. 2025;34:2796-2810. doi: 10.1109/TIP.2025.3564821. Epub 2025 May 12.

DOI:10.1109/TIP.2025.3564821
PMID:40315089
Abstract

Exploring complementary information between RGB and thermal/depth modalities is crucial for bi-modal salient object detection (BSOD). However, the distinct characteristics of different modalities often lead to large differences in information distributions. Existing models, which rely on convolutional operations or plug-and-play attention mechanisms, struggle to address this issue. To overcome this challenge, we rethink the relationship between information complementarity and long-range relevance, and propose a uniform broad-view Twins Transformer Network (TwinsTNet) for accurate BSOD. Specifically, to efficiently fuse bi-modal information, we first design the Cross-Modal Federated Attention (CMFA), which mines complementary cues across modalities through element-wise global dependency. Second, to ensure accurate modality fusion, we propose the Semantic Consistency Attention Loss, which supervises the co-attention feature in CMFA using the ground-truth-generated attention map. Additionally, existing BSOD models lack the exploration of inter-layer interactions, for which we propose the Cross-Scale Retracing Attention (CSRA), which retrieves query-relevant information from stacked features of all previous layers, enabling flexible cross-layer interactions. The cooperation between CMFA and CSRA mitigates inductive bias in both modality and layer dimensions, enhancing TwinsTNet's representational capability. Extensive experiments demonstrate that TwinsTNet outperforms twenty-two existing state-of-the-art models on ten BSOD benchmark datasets. The code is available at: https://github.com/JoshuaLPF/TwinsTNet.

摘要

探索RGB与热成像/深度模态之间的互补信息对于双模态显著目标检测(BSOD)至关重要。然而,不同模态的独特特征往往导致信息分布存在很大差异。现有的依赖卷积操作或即插即用注意力机制的模型难以解决这一问题。为了克服这一挑战,我们重新思考信息互补性与长距离相关性之间的关系,并提出一种统一的广视角孪生变压器网络(TwinsTNet)用于精确的双模态显著目标检测。具体而言,为了有效融合双模态信息,我们首先设计了跨模态联合注意力(CMFA),它通过逐元素全局依赖性挖掘跨模态的互补线索。其次,为了确保精确的模态融合,我们提出了语义一致性注意力损失,它使用由真实标签生成的注意力图来监督CMFA中的协同注意力特征。此外,现有的双模态显著目标检测模型缺乏对层间交互的探索,为此我们提出了跨尺度回溯注意力(CSRA),它从所有先前层的堆叠特征中检索与查询相关的信息,实现灵活的跨层交互。CMFA和CSRA之间的协作减轻了模态和层维度上的归纳偏差,增强了TwinsTNet的表征能力。大量实验表明,TwinsTNet在十个双模态显著目标检测基准数据集上优于二十二个现有的先进模型。代码可在以下网址获取:https://github.com/JoshuaLPF/TwinsTNet。

相似文献

1
TwinsTNet: Broad-View Twins Transformer Network for Bi-Modal Salient Object Detection.TwinsTNet:用于双模态显著目标检测的宽视角孪生变压器网络
IEEE Trans Image Process. 2025;34:2796-2810. doi: 10.1109/TIP.2025.3564821. Epub 2025 May 12.
2
SwinCross: Cross-modal Swin transformer for head-and-neck tumor segmentation in PET/CT images.SwinCross:用于 PET/CT 图像中头颈部肿瘤分割的跨模态 Swin 变换器。
Med Phys. 2024 Mar;51(3):2096-2107. doi: 10.1002/mp.16703. Epub 2023 Sep 30.
3
Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond.用于RGB-D显著目标检测及其他领域的解缠跨模态变换器
IEEE Trans Image Process. 2024;33:1699-1709. doi: 10.1109/TIP.2024.3364022. Epub 2024 Mar 5.
4
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection.CAVER:用于双模态显著目标检测的跨模态视图混合变换器
IEEE Trans Image Process. 2023;32:892-904. doi: 10.1109/TIP.2023.3234702. Epub 2023 Jan 23.
5
IFENet: Interaction, Fusion, and Enhancement network for V-D-T Salient Object Detection.IFENet:用于视频-深度-时间显著目标检测的交互、融合与增强网络
IEEE Trans Image Process. 2025 Jan 14;PP. doi: 10.1109/TIP.2025.3527372.
6
Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection.用于RGB-D显著目标检测的全局引导跨模态跨尺度网络
Sensors (Basel). 2023 Aug 17;23(16):7221. doi: 10.3390/s23167221.
7
ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection.ASIF-Net:用于 RGB-D 显著目标检测的注意力导向交织融合网络。
IEEE Trans Cybern. 2021 Jan;51(1):88-100. doi: 10.1109/TCYB.2020.2969255. Epub 2020 Dec 22.
8
HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness.HiDAnet:基于分层深度感知的 RGB-D 显著目标检测
IEEE Trans Image Process. 2023;32:2160-2173. doi: 10.1109/TIP.2023.3263111.
9
GDVIFNet: A generated depth and visible image fusion network with edge feature guidance for salient object detection.GDVIFNet:一种用于显著目标检测的具有边缘特征引导的生成式深度与可见光图像融合网络。
Neural Netw. 2025 Aug;188:107445. doi: 10.1016/j.neunet.2025.107445. Epub 2025 Apr 5.
10
Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection.用于RGB-T显著目标检测的轻量级跨模态信息相互增强网络
Entropy (Basel). 2024 Jan 31;26(2):130. doi: 10.3390/e26020130.