Suppr超能文献

通过基于组件的交互学习探索立体视频的显著度

Learning to Explore Saliency for Stereoscopic Videos via Component-Based Interaction.

作者信息

Zhang Qiudan, Wang Xu, Wang Shiqi, Sun Zhenhao, Kwong Sam, Jiang Jianmin

出版信息

IEEE Trans Image Process. 2020 Apr 13. doi: 10.1109/TIP.2020.2985531.

Abstract

In this paper, we devise a saliency prediction model for stereoscopic videos that learns to explore saliency inspired by the component-based interactions including spatial, temporal, as well as depth cues. The model first takes advantage of specific structure of 3D residual network (3D-ResNet) to model the saliency driven by spatio-temporal coherence from consecutive frames. Subsequently, the saliency inferred by implicit-depth is automatically derived based on the displacement correlation between left and right views by leveraging a deep convolutional network (ConvNet). Finally, a component-wise refinement network is devised to produce final saliency maps over time by aggregating saliency distributions obtained from multiple components. In order to further facilitate research towards stereoscopic video saliency, we create a new dataset including 175 stereoscopic video sequences with diverse content, as well as their dense eye fixation annotations. Extensive experiments support that our proposed model can achieve superior performance compared to the state-of-the-art methods on all publicly available eye fixation datasets.

摘要

在本文中,我们设计了一种用于立体视频的显著性预测模型,该模型学习探索基于组件交互的显著性,包括空间、时间以及深度线索。该模型首先利用3D残差网络(3D-ResNet)的特定结构来对由连续帧的时空连贯性驱动的显著性进行建模。随后,通过利用深度卷积网络(ConvNet),基于左右视图之间的位移相关性自动推导由隐式深度推断出的显著性。最后,设计了一个逐组件细化网络,通过聚合从多个组件获得的显著性分布来生成随时间变化的最终显著性图。为了进一步促进对立体视频显著性的研究,我们创建了一个新的数据集,其中包括175个具有不同内容的立体视频序列及其密集的眼睛注视标注。大量实验表明,与所有公开可用的眼睛注视数据集上的现有方法相比,我们提出的模型可以实现卓越的性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验