Li Gongyang, Liu Zhi, Ling Haibin
IEEE Trans Image Process. 2020 Mar 4. doi: 10.1109/TIP.2020.2976689.
RGB-D based salient object detection (SOD) methods leverage the depth map as a valuable complementary information for better SOD performance. Previous methods mainly resort to exploit the correlation between RGB image and depth map in three fusion domains: input images, extracted features, and output results. However, these fusion strategies cannot fully capture the complex correlation between the RGB image and depth map. Besides, these methods do not fully explore the cross-modal complementarity and the cross-level continuity of information, and treat information from different sources without discrimination. In this paper, to address these problems, we propose a novel Information Conversion Network (ICNet) for RGB-D based SOD by employing the siamese structure with encoder-decoder architecture. To fuse high-level RGB and depth features in an interactive and adaptive way, we propose a novel Information Conversion Module (ICM), which contains concatenation operations and correlation layers. Furthermore, we design a Cross-modal Depth-weighted Combination (CDC) block to discriminate the cross-modal features from different sources and to enhance RGB features with depth features at each level. Extensive experiments on five commonly tested datasets demonstrate the superiority of our ICNet over 15 state-of-theart RGB-D based SOD methods, and validate the effectiveness of the proposed ICM and CDC block.
基于RGB-D的显著目标检测(SOD)方法利用深度图作为有价值的补充信息,以实现更好的SOD性能。以往的方法主要通过在三个融合域中利用RGB图像和深度图之间的相关性:输入图像、提取的特征和输出结果。然而,这些融合策略无法完全捕捉RGB图像和深度图之间的复杂相关性。此外,这些方法没有充分探索信息的跨模态互补性和跨层次连续性,并且不加区分地处理来自不同来源的信息。在本文中,为了解决这些问题,我们通过采用具有编码器-解码器架构的暹罗结构,提出了一种用于基于RGB-D的SOD的新型信息转换网络(ICNet)。为了以交互和自适应的方式融合高级RGB和深度特征,我们提出了一种新型信息转换模块(ICM),它包含拼接操作和相关层。此外,我们设计了一个跨模态深度加权组合(CDC)块,以区分来自不同来源的跨模态特征,并在每个层次上用深度特征增强RGB特征。在五个常用测试数据集上进行的大量实验证明了我们的ICNet相对于15种基于RGB-D的先进SOD方法的优越性,并验证了所提出的ICM和CDC块的有效性。