IEEE Trans Cybern. 2016 Feb;46(2):487-98. doi: 10.1109/TCYB.2015.2404432. Epub 2015 Feb 27.
Saliency detection models aiming to quantitatively predict human eye-attended locations in the visual field have been receiving increasing research interest in recent years. Unlike traditional methods that rely on hand-designed features and contrast inference mechanisms, this paper proposes a novel framework to learn saliency detection models from raw image data using deep networks. The proposed framework mainly consists of two learning stages. At the first learning stage, we develop a stacked denoising autoencoder (SDAE) model to learn robust, representative features from raw image data under an unsupervised manner. The second learning stage aims to jointly learn optimal mechanisms to capture the intrinsic mutual patterns as the feature contrast and to integrate them for final saliency prediction. Given the input of pairs of a center patch and its surrounding patches represented by the features learned at the first stage, a SDAE network is trained under the supervision of eye fixation labels, which achieves both contrast inference and contrast integration simultaneously. Experiments on three publically available eye tracking benchmarks and the comparisons with 16 state-of-the-art approaches demonstrate the effectiveness of the proposed framework.
近年来,旨在定量预测人眼在视野中注视位置的显著度检测模型受到了越来越多的研究关注。与传统方法不同,传统方法依赖于手工设计的特征和对比度推理机制,本文提出了一种使用深度网络从原始图像数据中学习显著度检测模型的新框架。该框架主要由两个学习阶段组成。在第一学习阶段,我们开发了一个堆叠去噪自动编码器(SDAE)模型,以在无监督的情况下从原始图像数据中学习稳健、有代表性的特征。第二学习阶段旨在联合学习最优机制,以捕获特征对比度的内在相互模式,并将它们集成起来进行最终的显著度预测。给定由第一阶段学习的特征表示的中心补丁及其周围补丁对的输入,在眼动标记的监督下训练 SDAE 网络,同时实现对比度推理和对比度集成。在三个公开的眼动跟踪基准上的实验和与 16 种最先进方法的比较表明了该框架的有效性。