• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CAVER:用于双模态显著目标检测的跨模态视图混合变换器

CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection.

作者信息

Pang Youwei, Zhao Xiaoqi, Zhang Lihe, Lu Huchuan

出版信息

IEEE Trans Image Process. 2023;32:892-904. doi: 10.1109/TIP.2023.3234702. Epub 2023 Jan 23.

DOI:10.1109/TIP.2023.3234702
PMID:37018701
Abstract

Most of the existing bi-modal (RGB-D and RGB-T) salient object detection methods utilize the convolution operation and construct complex interweave fusion structures to achieve cross-modal information integration. The inherent local connectivity of the convolution operation constrains the performance of the convolution-based methods to a ceiling. In this work, we rethink these tasks from the perspective of global information alignment and transformation. Specifically, the proposed cross-modal view-mixed transformer (CAVER) cascades several cross-modal integration units to construct a top-down transformer-based information propagation path. CAVER treats the multi-scale and multi-modal feature integration as a sequence-to-sequence context propagation and update process built on a novel view-mixed attention mechanism. Besides, considering the quadratic complexity w.r.t. the number of input tokens, we design a parameter-free patch-wise token re-embedding strategy to simplify operations. Extensive experimental results on RGB-D and RGB-T SOD datasets demonstrate that such a simple two-stream encoder-decoder framework can surpass recent state-of-the-art methods when it is equipped with the proposed components.

摘要

现有的大多数双模态(RGB-D和RGB-T)显著目标检测方法都利用卷积操作,并构建复杂的交织融合结构来实现跨模态信息整合。卷积操作固有的局部连通性将基于卷积的方法的性能限制在一定水平。在这项工作中,我们从全局信息对齐和转换的角度重新思考这些任务。具体而言,所提出的跨模态视图混合变换器(CAVER)级联多个跨模态集成单元,以构建基于自上而下变换器的信息传播路径。CAVER将多尺度和多模态特征集成视为基于新型视图混合注意力机制构建的序列到序列的上下文传播和更新过程。此外,考虑到相对于输入令牌数量的二次复杂度,我们设计了一种无参数的逐补丁令牌重新嵌入策略来简化操作。在RGB-D和RGB-T SOD数据集上的大量实验结果表明,当配备所提出的组件时,这种简单的双流编码器-解码器框架可以超越最近的先进方法。

相似文献

1
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection.CAVER:用于双模态显著目标检测的跨模态视图混合变换器
IEEE Trans Image Process. 2023;32:892-904. doi: 10.1109/TIP.2023.3234702. Epub 2023 Jan 23.
2
Disentangled Cross-Modal Transformer for RGB-D Salient Object Detection and Beyond.用于RGB-D显著目标检测及其他领域的解缠跨模态变换器
IEEE Trans Image Process. 2024;33:1699-1709. doi: 10.1109/TIP.2024.3364022. Epub 2024 Mar 5.
3
3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond.用于RGB-D显著目标检测及其他应用的3D卷积神经网络
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4309-4323. doi: 10.1109/TNNLS.2022.3202241. Epub 2024 Feb 29.
4
Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection.基于Swin Transformer的RGB-D显著目标检测边缘引导网络
Sensors (Basel). 2023 Oct 29;23(21):8802. doi: 10.3390/s23218802.
5
Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection.用于RGB-T显著目标检测的轻量级跨模态信息相互增强网络
Entropy (Basel). 2024 Jan 31;26(2):130. doi: 10.3390/e26020130.
6
Middle-Level Feature Fusion for Lightweight RGB-D Salient Object Detection.用于轻量级RGB-D显著目标检测的中级特征融合
IEEE Trans Image Process. 2022;31:6621-6634. doi: 10.1109/TIP.2022.3214092. Epub 2022 Oct 26.
7
Exploring Multi-Modal Spatial-Temporal Contexts for High-Performance RGB-T Tracking.探索用于高性能RGB-T跟踪的多模态时空上下文
IEEE Trans Image Process. 2024;33:4303-4318. doi: 10.1109/TIP.2024.3428316. Epub 2024 Jul 30.
8
Three-stream Attention-aware Network for RGB-D Salient Object Detection.用于RGB-D显著目标检测的三流注意力感知网络
IEEE Trans Image Process. 2019 Jan 7. doi: 10.1109/TIP.2019.2891104.
9
Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection.用于RGB-D显著目标检测的全局引导跨模态跨尺度网络
Sensors (Basel). 2023 Aug 17;23(16):7221. doi: 10.3390/s23167221.
10
TwinsTNet: Broad-View Twins Transformer Network for Bi-Modal Salient Object Detection.TwinsTNet:用于双模态显著目标检测的宽视角孪生变压器网络
IEEE Trans Image Process. 2025;34:2796-2810. doi: 10.1109/TIP.2025.3564821. Epub 2025 May 12.

引用本文的文献

1
Retina-Inspired Models Enhance Visual Saliency Prediction.受视网膜启发的模型增强视觉显著性预测。
Entropy (Basel). 2025 Apr 18;27(4):436. doi: 10.3390/e27040436.
2
Wavelet-Driven Multi-Band Feature Fusion for RGB-T Salient Object Detection.用于RGB-T显著目标检测的小波驱动多波段特征融合
Sensors (Basel). 2024 Dec 20;24(24):8159. doi: 10.3390/s24248159.
3
Edge-guided feature fusion network for RGB-T salient object detection.用于RGB-T显著目标检测的边缘引导特征融合网络。
Front Neurorobot. 2024 Dec 17;18:1489658. doi: 10.3389/fnbot.2024.1489658. eCollection 2024.
4
Swin Transformer-Based Edge Guidance Network for RGB-D Salient Object Detection.基于Swin Transformer的RGB-D显著目标检测边缘引导网络
Sensors (Basel). 2023 Oct 29;23(21):8802. doi: 10.3390/s23218802.