School of Sport, Exercise and Rehabilitation Sciences, The University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK.
School of Psychology, The University of Birmingham, Birmingham, UK.
Behav Res Methods. 2023 Apr;55(3):1372-1391. doi: 10.3758/s13428-022-01833-4. Epub 2022 Jun 1.
With continued advancements in portable eye-tracker technology liberating experimenters from the restraints of artificial laboratory designs, research can now collect gaze data from real-world, natural navigation. However, the field lacks a robust method for achieving this, as past approaches relied upon the time-consuming manual annotation of eye-tracking data, while previous attempts at automation lack the necessary versatility for in-the-wild navigation trials consisting of complex and dynamic scenes. Here, we propose a system capable of informing researchers of where and what a user's gaze is focused upon at any one time. The system achieves this by first running footage recorded on a head-mounted camera through a deep-learning-based object detection algorithm called Masked Region-based Convolutional Neural Network (Mask R-CNN). The algorithm's output is combined with frame-by-frame gaze coordinates measured by an eye-tracking device synchronized with the head-mounted camera to detect and annotate, without any manual intervention, what a user looked at for each frame of the provided footage. The effectiveness of the presented methodology was legitimized by a comparison between the system output and that of manual coders. High levels of agreement between the two validated the system as a preferable data collection technique as it was capable of processing data at a significantly faster rate than its human counterpart. Support for the system's practicality was then further demonstrated via a case study exploring the mediatory effects of gaze behaviors on an environment-driven attentional bias.
随着便携式眼动追踪技术的不断进步,实验人员摆脱了人工实验室设计的限制,可以从真实世界的自然导航中收集注视数据。然而,该领域缺乏一种强大的方法来实现这一目标,因为过去的方法依赖于对眼动追踪数据进行耗时的手动注释,而以前的自动化尝试缺乏针对由复杂和动态场景组成的野外导航试验的必要通用性。在这里,我们提出了一个能够实时告知研究人员用户的注视焦点在哪里和注视什么的系统。该系统通过首先将头戴式摄像机记录的视频片段通过基于深度学习的对象检测算法,即掩模区域卷积神经网络(Mask R-CNN)运行来实现这一目标。该算法的输出与通过与头戴式摄像机同步的眼动追踪设备测量的逐帧注视坐标相结合,无需任何手动干预,即可检测和注释提供的视频片段中的每一帧中用户注视的内容。该方法的有效性通过系统输出与手动编码结果之间的比较得到验证。两者之间高度的一致性验证了该系统作为一种更优的数据收集技术的有效性,因为它能够以比人工编码快得多的速度处理数据。该系统的实用性通过一项案例研究得到了进一步证明,该研究探索了注视行为对环境驱动的注意偏向的中介作用。