IEEE Trans Image Process. 2017 Jan;26(1):369-385. doi: 10.1109/TIP.2016.2628583. Epub 2016 Nov 14.
Saliency detection has been widely studied to predict human fixations, with various applications in computer vision and image processing. For saliency detection, we argue in this paper that the state-of-the-art High Efficiency Video Coding (HEVC) standard can be used to generate the useful features in compressed domain. Therefore, this paper proposes to learn the video saliency model, with regard to HEVC features. First, we establish an eye tracking database for video saliency detection, which can be downloaded from https://github.com/remega/video_database. Through the statistical analysis on our eye tracking database, we find out that human fixations tend to fall into the regions with large-valued HEVC features on splitting depth, bit allocation, and motion vector (MV). In addition, three observations are obtained with the further analysis on our eye tracking database. Accordingly, several features in HEVC domain are proposed on the basis of splitting depth, bit allocation, and MV. Next, a kind of support vector machine is learned to integrate those HEVC features together, for video saliency detection. Since almost all video data are stored in the compressed form, our method is able to avoid both the computational cost on decoding and the storage cost on raw data. More importantly, experimental results show that the proposed method is superior to other state-of-the-art saliency detection methods, either in compressed or uncompressed domain.
显著性检测已被广泛研究用于预测人类注视点,在计算机视觉和图像处理中有各种应用。对于显著性检测,我们在本文中认为,最新的高效视频编码(HEVC)标准可用于在压缩域中生成有用特征。因此,本文提出学习关于HEVC特征的视频显著性模型。首先,我们建立了一个用于视频显著性检测的眼动跟踪数据库,该数据库可从https://github.com/remega/video_database下载。通过对我们的眼动跟踪数据库进行统计分析,我们发现人类注视点倾向于落在分割深度、比特分配和运动矢量(MV)方面具有大值HEVC特征的区域。此外,通过对我们的眼动跟踪数据库进行进一步分析,获得了三个观察结果。据此,基于分割深度、比特分配和MV提出了HEVC域中的几个特征。接下来,学习一种支持向量机将这些HEVC特征整合在一起,用于视频显著性检测。由于几乎所有视频数据都以压缩形式存储,我们的方法能够避免解码的计算成本和原始数据的存储成本。更重要的是,实验结果表明,所提出的方法在压缩域和非压缩域中均优于其他最新的显著性检测方法。