Zhang Xucong, Sugano Yusuke, Fritz Mario, Bulling Andreas
IEEE Trans Pattern Anal Mach Intell. 2019 Jan;41(1):162-175. doi: 10.1109/TPAMI.2017.2778103. Epub 2017 Nov 28.
Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze dataset, which contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance, while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves on the state of the art by 22 percent (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.
基于学习的方法被认为在无约束凝视估计方面表现良好,即从单目RGB相机进行凝视估计,无需对用户、环境或相机做出假设。然而,当前的凝视数据集是在实验室条件下收集的,并且方法没有在多个数据集上进行评估。我们的工作在解决这些局限性方面做出了三点贡献。首先,我们提出了MPIIGaze数据集,它包含213,659张全脸图像以及在几个月的日常笔记本电脑使用过程中从15名用户收集的相应真实凝视位置。一种经验抽样方法确保了连续的凝视和头部姿势以及眼睛外观和光照的真实变化。为了便于跨数据集评估,37,667张图像被手动标注了眼角、嘴角和瞳孔中心。其次,我们对包括MPIIGaze在内的三个当前数据集上的先进凝视估计方法进行了广泛评估。我们研究了关键挑战,包括目标凝视范围、光照条件和面部外观变化。我们表明图像分辨率和双眼的使用会影响凝视估计性能,而头部姿势和瞳孔中心信息的信息量较少。最后,我们提出了GazeNet,这是第一种基于深度外观的凝视估计方法。在最具挑战性的跨数据集评估中,GazeNet比当前最优方法提高了22%(平均误差从13.9度降至10.8度)。