Suppr超能文献

用于RGB-D显著目标检测及其他应用的3D卷积神经网络

3-D Convolutional Neural Networks for RGB-D Salient Object Detection and Beyond.

作者信息

Chen Qian, Zhang Zhenxi, Lu Yanye, Fu Keren, Zhao Qijun

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4309-4323. doi: 10.1109/TNNLS.2022.3202241. Epub 2024 Feb 29.

Abstract

RGB-depth (RGB-D) salient object detection (SOD) recently has attracted increasing research interest, and many deep learning methods based on encoder-decoder architectures have emerged. However, most existing RGB-D SOD models conduct explicit and controllable cross-modal feature fusion either in the single encoder or decoder stage, which hardly guarantees sufficient cross-modal fusion ability. To this end, we make the first attempt in addressing RGB-D SOD through 3-D convolutional neural networks. The proposed model, named RD3D, aims at prefusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RGB and depth streams. Specifically, RD3D first conducts prefusion across RGB and depth modalities through a 3-D encoder obtained by inflating 2-D ResNet and later provides in-depth feature fusion by designing a 3-D decoder equipped with rich back-projection paths (RBPPs) for leveraging the extensive aggregation ability of 3-D convolutions. Toward an improved model RD3D+, we propose to disentangle the conventional 3-D convolution into successive spatial and temporal convolutions and, meanwhile, discard unnecessary zero padding. This eventually results in a 2-D convolutional equivalence that facilitates optimization and reduces parameters and computation costs. Thanks to such a progressive-fusion strategy involving both the encoder and the decoder, effective and thorough interactions between the two modalities can be exploited and boost detection accuracy. As an additional boost, we also introduce channel-modality attention and its variant after each path of RBPP to attend to important features. Extensive experiments on seven widely used benchmark datasets demonstrate that RD3D and RD3D+ perform favorably against 14 state-of-the-art RGB-D SOD approaches in terms of five key evaluation metrics. Our code will be made publicly available at https://github.com/PPOLYpubki/RD3D.

摘要

RGB深度(RGB-D)显著目标检测(SOD)近来已吸引了越来越多的研究兴趣,并且出现了许多基于编码器-解码器架构的深度学习方法。然而,大多数现有的RGB-D SOD模型在单个编码器或解码器阶段进行显式且可控的跨模态特征融合,这几乎无法保证足够的跨模态融合能力。为此,我们首次尝试通过三维卷积神经网络来解决RGB-D SOD问题。所提出的模型名为RD3D,旨在在编码器阶段进行预融合,并在解码器阶段进行深度融合,以有效促进RGB和深度流的全面整合。具体而言,RD3D首先通过对二维ResNet进行膨胀得到的三维编码器,在RGB和深度模态之间进行预融合,随后通过设计一个配备丰富反向投影路径(RBPP)的三维解码器来提供深度特征融合,以利用三维卷积的广泛聚合能力。对于改进后的模型RD3D+,我们建议将传统的三维卷积分解为连续的空间和时间卷积,同时丢弃不必要的零填充。这最终导致二维卷积等价,便于优化并减少参数和计算成本。得益于这种涉及编码器和解码器的渐进融合策略,可以利用两种模态之间有效且彻底的交互并提高检测精度。作为额外的提升,我们还在RBPP的每条路径之后引入通道模态注意力及其变体,以关注重要特征。在七个广泛使用的基准数据集上进行的大量实验表明,在五个关键评估指标方面,RD3D和RD3D+的表现优于14种最新的RGB-D SOD方法。我们的代码将在https://github.com/PPOLYpubki/RD3D上公开提供。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验