眼动行为：用于视觉识别的动态注视数据集和习得显著模型。

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2015 Jul;37(7):1408-24. doi: 10.1109/TPAMI.2014.2366154.

DOI:10.1109/TPAMI.2014.2366154

Abstract

Systems based on bag-of-words models from image features collected at maxima of sparse interest point operators have been used successfully for both computer visual object and action recognition tasks. While the sparse, interest-point based approach to recognition is not inconsistent with visual processing in biological systems that operate in `saccade and fixate' regimes, the methodology and emphasis in the human and the computer vision communities remains sharply distinct. Here, we make three contributions aiming to bridge this gap. First, we complement existing state-of-the art large scale dynamic computer vision annotated datasets like Hollywood-2 [1] and UCF Sports [2] with human eye movements collected under the ecological constraints of visual action and scene context recognition tasks. To our knowledge these are the first large human eye tracking datasets to be collected and made publicly available for video, vision.imar.ro/eyetracking (497,107 frames, each viewed by 19 subjects), unique in terms of their (a) large scale and computer vision relevance, (b) dynamic, video stimuli, (c) task control, as well as free-viewing. Second, we introduce novel dynamic consistency and alignment measures, which underline the remarkable stability of patterns of visual search among subjects. Third, we leverage the significant amount of collected data in order to pursue studies and build automatic, end-to-end trainable computer vision systems based on human eye movements. Our studies not only shed light on the differences between computer vision spatio-temporal interest point image sampling strategies and the human fixations, as well as their impact for visual recognition performance, but also demonstrate that human fixations can be accurately predicted, and when used in an end-to-end automatic system, leveraging some of the advanced computer vision practice, can lead to state of the art results.

摘要

基于稀疏兴趣点算子的图像特征的词袋模型系统已成功应用于计算机视觉目标和动作识别任务。虽然基于稀疏兴趣点的识别方法与在“扫视和注视”模式下运行的生物系统的视觉处理不一致，但人类和计算机视觉领域的方法和重点仍然存在明显的差异。在这里，我们做出了三个贡献，旨在弥合这一差距。首先，我们用人类在视觉动作和场景上下文识别任务的生态约束下收集的眼动补充了现有的大规模动态计算机视觉标注数据集，如 Hollywood-2[1]和 UCF Sports[2]。据我们所知，这些是第一批为视频、视觉收集并公开提供的大规模人类眼动追踪数据集，它们具有（a）大规模和计算机视觉相关性，（b）动态视频刺激，（c）任务控制以及自由观看的特点。其次，我们引入了新的动态一致性和对齐度量标准，这些标准突出了受试者之间视觉搜索模式的显著稳定性。第三，我们利用大量收集的数据来进行研究，并构建基于人类眼动的自动、端到端可训练的计算机视觉系统。我们的研究不仅揭示了计算机视觉时空兴趣点图像采样策略与人类注视之间的差异，以及它们对视觉识别性能的影响，而且还表明，人类注视可以被准确预测，并且当用于端到端自动系统时，利用一些先进的计算机视觉实践，可以达到最先进的结果。

相似文献

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition.

IEEE Trans Pattern Anal Mach Intell. 2015 Jul;37(7):1408-24. doi: 10.1109/TPAMI.2014.2366154.

Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study.

IEEE Trans Image Process. 2013 Jan;22(1):55-69. doi: 10.1109/TIP.2012.2210727. Epub 2012 Jul 30.

A Model of the Superior Colliculus Predicts Fixation Locations during Scene Viewing and Visual Search.

J Neurosci. 2017 Feb 8;37(6):1453-1467. doi: 10.1523/JNEUROSCI.0825-16.2016. Epub 2016 Dec 30.

Variability of eye movements when viewing dynamic natural scenes.

J Vis. 2010 Aug 26;10(10):28. doi: 10.1167/10.10.28.

Augmented saliency model using automatic 3D head pose detection and learned gaze following in natural scenes.

Vision Res. 2015 Nov;116(Pt B):113-26. doi: 10.1016/j.visres.2014.10.027. Epub 2014 Nov 13.

Appearance-based gaze estimation using visual saliency.

IEEE Trans Pattern Anal Mach Intell. 2013 Feb;35(2):329-41. doi: 10.1109/TPAMI.2012.101.

Image processing strategies based on saliency segmentation for object recognition under simulated prosthetic vision.

Artif Intell Med. 2018 Jan;84:64-78. doi: 10.1016/j.artmed.2017.11.001. Epub 2017 Nov 10.

Observers' cognitive states modulate how visual inputs relate to gaze control.

J Exp Psychol Hum Percept Perform. 2016 Sep;42(9):1429-42. doi: 10.1037/xhp0000224. Epub 2016 Apr 28.

Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action Recognition.

Sensors (Basel). 2019 Jun 21;19(12):2790. doi: 10.3390/s19122790.

Object detection through search with a foveated visual system.

PLoS Comput Biol. 2017 Oct 9;13(10):e1005743. doi: 10.1371/journal.pcbi.1005743. eCollection 2017 Oct.

引用本文的文献

Multiscale Cascaded Attention Network for Saliency Detection Based on ResNet.

Sensors (Basel). 2022 Dec 16;22(24):9950. doi: 10.3390/s22249950.

Where to look at the movies: Analyzing visual attention to understand movie editing.

Behav Res Methods. 2023 Sep;55(6):2940-2959. doi: 10.3758/s13428-022-01949-7. Epub 2022 Aug 24.

Stable Gaze Tracking with Filtering Based on Internet of Things.

Sensors (Basel). 2022 Apr 20;22(9):3131. doi: 10.3390/s22093131.

Predicting Goal-directed Attention Control Using Inverse-Reinforcement Learning.

Neuron Behav Data Anal Theory. 2021;2021. doi: 10.51628/001c.22322. Epub 2021 Apr 20.

Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning.

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2020 Jun;2020:190-199. doi: 10.1109/cvpr42600.2020.00027. Epub 2020 Aug 5.

Human Gaze Assisted Artificial Intelligence: A Review.

IJCAI (U S). 2020 Jul;2020:4951-4958. doi: 10.24963/ijcai.2020/689.

Development of Open-source Software and Gaze Data Repositories for Performance Evaluation of Eye Tracking Systems.

Vision (Basel). 2019 Oct 22;3(4):55. doi: 10.3390/vision3040055.

An Unsupervised Framework for Online Spatiotemporal Detection of Activities of Daily Living by Hierarchical Activity Models.

Sensors (Basel). 2019 Sep 29;19(19):4237. doi: 10.3390/s19194237.

A free database of eye movements watching "Hollywood" videoclips.

Data Brief. 2019 Jun 4;25:103991. doi: 10.1016/j.dib.2019.103991. eCollection 2019 Aug.

Multi-task SonoEyeNet: Detection of Fetal Standardized Planes Assisted by Generated Sonographer Attention Maps.

Med Image Comput Comput Assist Interv. 2018 Sep;11070:871-879. doi: 10.1007/978-3-030-00928-1_98. Epub 2018 Sep 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

眼动行为：用于视觉识别的动态注视数据集和习得显著模型。

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献