• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于RGB-D显著目标检测及其他应用的3D卷积神经网络

3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond.

作者信息

Chen Qian, Zhang Zhenxi, Lu Yanye, Fu Keren, Zhao Qijun

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4309-4323. doi: 10.1109/TNNLS.2022.3202241. Epub 2024 Feb 29.

DOI:10.1109/TNNLS.2022.3202241
PMID:36099219
Abstract

RGB-depth (RGB-D) salient object detection (SOD) recently has attracted increasing research interest, and many deep learning methods based on encoder-decoder architectures have emerged. However, most existing RGB-D SOD models conduct explicit and controllable cross-modal feature fusion either in the single encoder or decoder stage, which hardly guarantees sufficient cross-modal fusion ability. To this end, we make the first attempt in addressing RGB-D SOD through 3-D convolutional neural networks. The proposed model, named RD3D, aims at prefusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RGB and depth streams. Specifically, RD3D first conducts prefusion across RGB and depth modalities through a 3-D encoder obtained by inflating 2-D ResNet and later provides in-depth feature fusion by designing a 3-D decoder equipped with rich back-projection paths (RBPPs) for leveraging the extensive aggregation ability of 3-D convolutions. Toward an improved model RD3D+, we propose to disentangle the conventional 3-D convolution into successive spatial and temporal convolutions and, meanwhile, discard unnecessary zero padding. This eventually results in a 2-D convolutional equivalence that facilitates optimization and reduces parameters and computation costs. Thanks to such a progressive-fusion strategy involving both the encoder and the decoder, effective and thorough interactions between the two modalities can be exploited and boost detection accuracy. As an additional boost, we also introduce channel-modality attention and its variant after each path of RBPP to attend to important features. Extensive experiments on seven widely used benchmark datasets demonstrate that RD3D and RD3D+ perform favorably against 14 state-of-the-art RGB-D SOD approaches in terms of five key evaluation metrics. Our code will be made publicly available at https://github.com/PPOLYpubki/RD3D.

摘要

RGB深度(RGB-D)显著目标检测(SOD)近来已吸引了越来越多的研究兴趣,并且出现了许多基于编码器-解码器架构的深度学习方法。然而,大多数现有的RGB-D SOD模型在单个编码器或解码器阶段进行显式且可控的跨模态特征融合,这几乎无法保证足够的跨模态融合能力。为此,我们首次尝试通过三维卷积神经网络来解决RGB-D SOD问题。所提出的模型名为RD3D,旨在在编码器阶段进行预融合,并在解码器阶段进行深度融合,以有效促进RGB和深度流的全面整合。具体而言,RD3D首先通过对二维ResNet进行膨胀得到的三维编码器,在RGB和深度模态之间进行预融合,随后通过设计一个配备丰富反向投影路径(RBPP)的三维解码器来提供深度特征融合,以利用三维卷积的广泛聚合能力。对于改进后的模型RD3D+,我们建议将传统的三维卷积分解为连续的空间和时间卷积,同时丢弃不必要的零填充。这最终导致二维卷积等价,便于优化并减少参数和计算成本。得益于这种涉及编码器和解码器的渐进融合策略,可以利用两种模态之间有效且彻底的交互并提高检测精度。作为额外的提升,我们还在RBPP的每条路径之后引入通道模态注意力及其变体,以关注重要特征。在七个广泛使用的基准数据集上进行的大量实验表明,在五个关键评估指标方面,RD3D和RD3D+的表现优于14种最新的RGB-D SOD方法。我们的代码将在https://github.com/PPOLYpubki/RD3D上公开提供。

相似文献

1
3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond.用于RGB-D显著目标检测及其他应用的3D卷积神经网络
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4309-4323. doi: 10.1109/TNNLS.2022.3202241. Epub 2024 Feb 29.
2
CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection.CIR-Net:用于RGB-D显著目标检测的跨模态交互与优化
IEEE Trans Image Process. 2022;31:6800-6815. doi: 10.1109/TIP.2022.3216198. Epub 2022 Oct 28.
3
CDNet: Complementary Depth Network for RGB-D Salient Object Detection.CDNet:用于RGB-D显著目标检测的互补深度网络。
IEEE Trans Image Process. 2021;30:3376-3390. doi: 10.1109/TIP.2021.3060167. Epub 2021 Mar 9.
4
UTDNet: A unified triplet decoder network for multimodal salient object detection.UTDNet:一种用于多模态显著目标检测的统一三元解码器网络。
Neural Netw. 2024 Feb;170:521-534. doi: 10.1016/j.neunet.2023.11.051. Epub 2023 Nov 24.
5
HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness.HiDAnet:基于分层深度感知的 RGB-D 显著目标检测
IEEE Trans Image Process. 2023;32:2160-2173. doi: 10.1109/TIP.2023.3263111.
6
ICNet: Information Conversion Network for RGB-D Based Salient Object Detection.ICNet:基于RGB-D的显著目标检测的信息转换网络。
IEEE Trans Image Process. 2020 Mar 4. doi: 10.1109/TIP.2020.2976689.
7
Hierarchical Alternate Interaction Network for RGB-D Salient Object Detection.用于RGB-D显著目标检测的分层交替交互网络
IEEE Trans Image Process. 2021;30:3528-3542. doi: 10.1109/TIP.2021.3062689. Epub 2021 Mar 11.
8
ASIF-Net: Attention Steered Interweave Fusion Network for RGB-D Salient Object Detection.ASIF-Net:用于 RGB-D 显著目标检测的注意力导向交织融合网络。
IEEE Trans Cybern. 2021 Jan;51(1):88-100. doi: 10.1109/TCYB.2020.2969255. Epub 2020 Dec 22.
9
IRFR-Net: Interactive Recursive Feature-Reshaping Network for Detecting Salient Objects in RGB-D Images.IRFR-Net:用于在RGB-D图像中检测显著目标的交互式递归特征重塑网络
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):4132-4144. doi: 10.1109/TNNLS.2021.3105484. Epub 2025 Feb 28.
10
CAVER: Cross-Modal View-Mixed Transformer for Bi-Modal Salient Object Detection.CAVER:用于双模态显著目标检测的跨模态视图混合变换器
IEEE Trans Image Process. 2023;32:892-904. doi: 10.1109/TIP.2023.3234702. Epub 2023 Jan 23.

引用本文的文献

1
Cross-modal interactive and global awareness fusion network for RGB-D salient object detection.用于RGB-D显著目标检测的跨模态交互与全局感知融合网络
PLoS One. 2025 Jun 12;20(6):e0325301. doi: 10.1371/journal.pone.0325301. eCollection 2025.
2
MAF-Net: A multimodal data fusion approach for human action recognition.MAF-Net:一种用于人类动作识别的多模态数据融合方法。
PLoS One. 2025 Apr 9;20(4):e0319656. doi: 10.1371/journal.pone.0319656. eCollection 2025.
3
Wavelet-Driven Multi-Band Feature Fusion for RGB-T Salient Object Detection.
用于RGB-T显著目标检测的小波驱动多波段特征融合
Sensors (Basel). 2024 Dec 20;24(24):8159. doi: 10.3390/s24248159.
4
Masked Generative Light Field Prompting for Pixel-Level Structure Segmentations.用于像素级结构分割的掩码生成光场提示
Research (Wash D C). 2024 Mar 26;7:0328. doi: 10.34133/research.0328. eCollection 2024.