IEEE Trans Image Process. 2021;30:9179-9192. doi: 10.1109/TIP.2021.3123548. Epub 2021 Nov 10.
RGB-D saliency detection is receiving more and more attention in recent years. There are many efforts have been devoted to this area, where most of them try to integrate the multi-modal information, i.e. RGB images and depth maps, via various fusion strategies. However, some of them ignore the inherent difference between the two modalities, which leads to the performance degradation when handling some challenging scenes. Therefore, in this paper, we propose a novel RGB-D saliency model, namely Dynamic Selective Network (DSNet), to perform salient object detection (SOD) in RGB-D images by taking full advantage of the complementarity between the two modalities. Specifically, we first deploy a cross-modal global context module (CGCM) to acquire the high-level semantic information, which can be used to roughly locate salient objects. Then, we design a dynamic selective module (DSM) to dynamically mine the cross-modal complementary information between RGB images and depth maps, and to further optimize the multi-level and multi-scale information by executing the gated and pooling based selection, respectively. Moreover, we conduct the boundary refinement to obtain high-quality saliency maps with clear boundary details. Extensive experiments on eight public RGB-D datasets show that the proposed DSNet achieves a competitive and excellent performance against the current 17 state-of-the-art RGB-D SOD models.
近年来,RGB-D 显著度检测受到越来越多的关注。许多研究都致力于这一领域,其中大多数研究试图通过各种融合策略整合多模态信息,即 RGB 图像和深度图。然而,其中一些研究忽略了两种模态之间的固有差异,这导致在处理一些具有挑战性的场景时性能下降。因此,在本文中,我们提出了一种新的 RGB-D 显著度模型,即动态选择网络(DSNet),通过充分利用两种模态之间的互补性,对 RGB-D 图像进行显著目标检测(SOD)。具体来说,我们首先部署一个跨模态全局上下文模块(CGCM)来获取高层语义信息,该信息可用于粗略定位显著目标。然后,我们设计了一个动态选择模块(DSM),通过执行基于门控和池化的选择,动态挖掘 RGB 图像和深度图之间的跨模态互补信息,并进一步优化多层次和多尺度信息。此外,我们进行边界细化,以获得具有清晰边界细节的高质量显著图。在八个公共的 RGB-D 数据集上的广泛实验表明,所提出的 DSNet 在当前 17 种最先进的 RGB-D SOD 模型中具有竞争力和卓越的性能。