Wang Fengyun, Pan Jinshan, Xu Shoukun, Tang Jinhui
IEEE Trans Image Process. 2022;31:1285-1297. doi: 10.1109/TIP.2022.3140606. Epub 2022 Jan 25.
How to explore useful information from depth is the key success of the RGB-D saliency detection methods. While the RGB and depth images are from different domains, a modality gap will lead to unsatisfactory results for simple feature concatenation. Towards better performance, most methods focus on bridging this gap and designing different cross-modal fusion modules for features, while ignoring explicitly extracting some useful consistent information from them. To overcome this problem, we develop a simple yet effective RGB-D saliency detection method by learning discriminative cross-modality features based on the deep neural network. The proposed method first learns modality-specific features for RGB and depth inputs. And then we separately calculate the correlations of every pixel-pair in a cross-modality consistent way, i.e., the distribution ranges are consistent for the correlations calculated based on features extracted from RGB (RGB correlation) or depth inputs (depth correlation). From different perspectives, color or spatial, the RGB and depth correlations end up at the same point to depict how tightly each pixel-pair is related. Secondly, to complemently gather RGB and depth information, we propose a novel correlation-fusion to fuse RGB and depth correlations, resulting in a cross-modality correlation. Finally, the features are refined with both long-range cross-modality correlations and local depth correlations to predict salient maps. In which, the long-range cross-modality correlation provides context information for accurate localization, and the local depth correlation keeps good subtle structures for fine segmentation. In addition, a lightweight DepthNet is designed for efficient depth feature extraction. We solve the proposed network in an end-to-end manner. Both quantitative and qualitative experimental results demonstrate the proposed algorithm achieves favorable performance against state-of-the-art methods.
如何从深度上探索有用信息是RGB-D显著性检测方法取得成功的关键。虽然RGB图像和深度图像来自不同领域,但模态差异会导致简单的特征拼接产生不尽人意的结果。为了获得更好的性能,大多数方法专注于弥合这一差异,并为特征设计不同的跨模态融合模块,却忽略了从它们中明确提取一些有用的一致信息。为克服这个问题,我们基于深度神经网络学习判别性跨模态特征,开发了一种简单而有效的RGB-D显著性检测方法。所提出的方法首先为RGB和深度输入学习特定模态的特征。然后,我们以跨模态一致的方式分别计算每个像素对的相关性,即基于从RGB提取的特征(RGB相关性)或深度输入(深度相关性)计算的相关性,其分布范围是一致的。从不同角度,颜色或空间角度来看,RGB和深度相关性最终汇聚于同一点,以描述每个像素对的紧密关联程度。其次,为了互补地收集RGB和深度信息,我们提出了一种新颖的相关性融合方法来融合RGB和深度相关性,从而得到一个跨模态相关性。最后,利用长距离跨模态相关性和局部深度相关性对特征进行细化,以预测显著图。其中,长距离跨模态相关性为精确的定位提供上下文信息,局部深度相关性为精细分割保留良好的细微结构。此外,还设计了一个轻量级的深度网络用于高效的深度特征提取。我们以端到端的方式求解所提出的网络。定量和定性实验结果均表明,所提出的算法相对于现有方法具有良好的性能。