Suppr超能文献

RGBE-注视:用于高频远程注视跟踪的大规模基于事件的多模态数据集。

RGBE-Gaze: A Large-Scale Event-Based Multimodal Dataset for High Frequency Remote Gaze Tracking.

作者信息

Zhao Guangrong, Shen Yiran, Zhang Chenlong, Shen Zhaoxin, Zhou Yuanfeng, Wen Hongkai

出版信息

IEEE Trans Pattern Anal Mach Intell. 2025 Jan;47(1):601-615. doi: 10.1109/TPAMI.2024.3474858. Epub 2024 Dec 4.

Abstract

High-frequency gaze tracking demonstrates significant potential in various critical applications, such as foveatedrendering, gaze-based identity verification, and the diagnosis of mental disorders. However, existing eye-tracking systems based on CCD/CMOS cameras either provide tracking frequencies below 200 Hz or employ high-speedcameras, causing high power consumption and bulky devices. While there have been some high-speed eye-tracking datasets and methods based on event cameras, they are primarily tailored for near-eye camera scenarios. They lackthe advantages associated with remote camera scenarios, such as the absence of the need for direct contact, improved user comfort and head pose freedom. In this work, we present RGBE-Gaze, the first large-scale and multimodal dataset for remote gaze tracking in high-frequency through synchronizing RGB and event cameras. This dataset is collected from 66 participants with diverse genders and age groups. Our setup captures 3.6 million RGB images and 26.3 billion event samples. Additionally, the dataset includes 10.7 million gaze references from the Gazepoint GP3 HD eye tracker and 15,972 sparse points of gaze (PoG) ground truth obtained through manualstimuli clicks by participants. We present dataset characteristics such as head pose, gaze direction, and pupil size. Furthermore, we introduce a hybrid frame-event based gaze estimation method specifically designed for the collected dataset. Moreover, we perform extensive evaluations of different benchmarking methods under variousgaze-related factors. The evaluation results illustrate that introducing event stream as a new modality improves gazetracking frequency and demonstrates greater estimation robustness across diverse gaze-related factors.

摘要

高频注视跟踪在各种关键应用中显示出巨大潜力,如注视点渲染、基于注视的身份验证以及精神障碍诊断。然而,现有的基于电荷耦合器件/互补金属氧化物半导体(CCD/CMOS)相机的眼动跟踪系统要么提供低于200赫兹的跟踪频率,要么采用高速相机,导致功耗高且设备体积庞大。虽然已经有一些基于事件相机的高速眼动跟踪数据集和方法,但它们主要是针对近眼相机场景定制的。它们缺乏与远程相机场景相关的优势,例如无需直接接触、提高用户舒适度和头部姿势自由度。在这项工作中,我们展示了RGBE-Gaze,这是第一个通过同步RGB相机和事件相机进行高频远程注视跟踪的大规模多模态数据集。该数据集是从66名不同性别和年龄组的参与者中收集的。我们的设置捕获了360万张RGB图像和263亿个事件样本。此外,该数据集包括来自Gazepoint GP3 HD眼动仪的1070万个注视参考以及通过参与者手动刺激点击获得的15972个稀疏注视点(PoG)真值。我们展示了数据集的特征,如头部姿势、注视方向和瞳孔大小。此外,我们引入了一种专门为收集的数据集设计的基于混合帧-事件的注视估计方法。此外,我们在各种与注视相关的因素下对不同的基准方法进行了广泛评估。评估结果表明,引入事件流作为一种新的模态提高了注视跟踪频率,并在各种与注视相关的因素中表现出更高的估计鲁棒性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验