Nguyen Anh-Duc, Kim Jongyoo, Oh Heeseok, Kim Haksub, Lin Weisi, Lee Sanghoon
IEEE Trans Image Process. 2018 Nov 2. doi: 10.1109/TIP.2018.2879408.
Visual saliency on stereoscopic 3D (S3D) images has been shown to be heavily influenced by image quality. Hence, this dependency is an important factor in image quality prediction, image restoration and discomfort reduction, but it is still very difficult to predict such a nonlinear relation in images. In addition, most algorithms specialized in detecting visual saliency on pristine images may unsurprisingly fail when facing distorted images. In this paper, we investigate a deep learning scheme named Deep Visual Saliency (DeepVS) to achieve a more accurate and reliable saliency predictor even in the presence of distortions. Since visual saliency is influenced by low-level features (contrast, luminance and depth information) from a psychophysical point of view, we propose seven low-level features derived from S3D image pairs and utilize them in the context of deep learning to detect visual attention adaptively to human perception. During analysis, it turns out that the low-level features play a role to extract distortion and saliency information. To construct saliency predictors, we weight and model the human visual saliency through two different network architectures, a regression and a fully convolutional neural networks (CNNs). Our results from thorough experiments confirm that the predicted saliency maps are up to 70 % correlated with human gaze patterns, which emphasize the need for the hand-crafted features as input to deep neural networks in S3D saliency detection.
立体3D(S3D)图像的视觉显著性已被证明受图像质量的影响很大。因此,这种依赖性是图像质量预测、图像恢复和减少不适感的一个重要因素,但预测图像中的这种非线性关系仍然非常困难。此外,大多数专门用于检测原始图像视觉显著性的算法在面对失真图像时可能不出所料地失效。在本文中,我们研究了一种名为深度视觉显著性(DeepVS)的深度学习方案,即使在存在失真的情况下也能实现更准确可靠的显著性预测器。从心理物理学的角度来看,由于视觉显著性受低级特征(对比度、亮度和深度信息)的影响,我们提出了从S3D图像对中导出的七个低级特征,并在深度学习的背景下利用它们来自适应地检测人类感知中的视觉注意力。在分析过程中,事实证明低级特征在提取失真和显著性信息方面发挥了作用。为了构建显著性预测器,我们通过两种不同的网络架构,即回归网络和全卷积神经网络(CNN),对人类视觉显著性进行加权和建模。我们通过全面实验得到的结果证实,预测的显著性图与人类注视模式的相关性高达70%,这强调了在S3D显著性检测中需要将手工制作的特征作为深度神经网络的输入。