DGaze：基于 CNN 的动态场景中的注视预测。

DGaze: CNN-Based Gaze Prediction in Dynamic Scenes.

出版信息

IEEE Trans Vis Comput Graph. 2020 May;26(5):1902-1911. doi: 10.1109/TVCG.2020.2973473. Epub 2020 Feb 13.

DOI:10.1109/TVCG.2020.2973473

Abstract

We conduct novel analyses of users' gaze behaviors in dynamic virtual scenes and, based on our analyses, we present a novel CNN-based model called DGaze for gaze prediction in HMD-based applications. We first collect 43 users' eye tracking data in 5 dynamic scenes under free-viewing conditions. Next, we perform statistical analysis of our data and observe that dynamic object positions, head rotation velocities, and salient regions are correlated with users' gaze positions. Based on our analysis, we present a CNN-based model (DGaze) that combines object position sequence, head velocity sequence, and saliency features to predict users' gaze positions. Our model can be applied to predict not only realtime gaze positions but also gaze positions in the near future and can achieve better performance than prior method. In terms of realtime prediction, DGaze achieves a 22.0% improvement over prior method in dynamic scenes and obtains an improvement of 9.5% in static scenes, based on using the angular distance as the evaluation metric. We also propose a variant of our model called DGaze_ET that can be used to predict future gaze positions with higher precision by combining accurate past gaze data gathered using an eye tracker. We further analyze our CNN architecture and verify the effectiveness of each component in our model. We apply DGaze to gaze-contingent rendering and a game, and also present the evaluation results from a user study.

摘要

我们对动态虚拟场景中的用户注视行为进行了新颖的分析，并基于分析结果提出了一种名为 DGaze 的新型基于 CNN 的模型，用于在基于 HMD 的应用中进行注视预测。我们首先在自由观察条件下收集了 5 个动态场景中 43 个用户的眼动追踪数据。接下来，我们对数据进行了统计分析，观察到动态物体位置、头部旋转速度和显著区域与用户的注视位置相关。基于我们的分析，我们提出了一种基于 CNN 的模型 (DGaze)，它结合了物体位置序列、头部速度序列和显著特征来预测用户的注视位置。我们的模型不仅可以预测实时的注视位置，还可以预测未来的注视位置，并且比之前的方法性能更好。在实时预测方面，基于角距离作为评估指标，DGaze 在动态场景中比之前的方法提高了 22.0%，在静态场景中提高了 9.5%。我们还提出了我们的模型的一个变体 DGaze_ET，它可以通过结合使用眼动仪收集的准确的过去注视数据来更精确地预测未来的注视位置。我们进一步分析了我们的 CNN 架构，并验证了模型中每个组件的有效性。我们将 DGaze 应用于注视相关渲染和游戏中，并展示了用户研究的评估结果。