Huang Kengda, Zhou Wujie, Fang Meixin
School of Information and Electronic Engineering, Zhejiang University of Science & Technology, Hangzhou 310023, China.
Institute of Information and Communication Engineering, Zhejiang University, Hangzhou 310027, China.
Comput Intell Neurosci. 2021 May 5;2021:6610997. doi: 10.1155/2021/6610997. eCollection 2021.
In recent years, the prediction of salient regions in RGB-D images has become a focus of research. Compared to its RGB counterpart, the saliency prediction of RGB-D images is more challenging. In this study, we propose a novel deep multimodal fusion autoencoder for the saliency prediction of RGB-D images. The core trainable autoencoder of the RGB-D saliency prediction model employs two raw modalities (RGB and depth/disparity information) as inputs and their corresponding eye-fixation attributes as labels. The autoencoder comprises four main networks: color channel network, disparity channel network, feature concatenated network, and feature learning network. The autoencoder can mine the complex relationship and make the utmost of the complementary characteristics between both color and disparity cues. Finally, the saliency map is predicted via a feature combination subnetwork, which combines the deep features extracted from a prior learning and convolutional feature learning subnetworks. We compare the proposed autoencoder with other saliency prediction models on two publicly available benchmark datasets. The results demonstrate that the proposed autoencoder outperforms these models by a significant margin.
近年来,RGB-D图像中显著区域的预测已成为研究热点。与RGB图像相比,RGB-D图像的显著性预测更具挑战性。在本研究中,我们提出了一种用于RGB-D图像显著性预测的新型深度多模态融合自动编码器。RGB-D显著性预测模型的核心可训练自动编码器采用两种原始模态(RGB和深度/视差信息)作为输入,并将其相应的眼动注视属性作为标签。该自动编码器包括四个主要网络:颜色通道网络、视差通道网络、特征拼接网络和特征学习网络。该自动编码器可以挖掘复杂关系,并充分利用颜色和视差线索之间的互补特性。最后,通过一个特征组合子网预测显著性图,该子网结合了从先前学习和卷积特征学习子网中提取的深度特征。我们在两个公开可用的基准数据集上,将所提出的自动编码器与其他显著性预测模型进行了比较。结果表明,所提出的自动编码器显著优于这些模型。