Lai Bolin, Liu Miao, Ryan Fiona, Rehg James M
Georgia Institute of Technology, Atlanta, GA 30308 USA.
Meta AI, Menlo Park, CA 94025 USA.
Int J Comput Vis. 2024;132(3):854-871. doi: 10.1007/s11263-023-01879-7. Epub 2023 Oct 18.
Predicting human's gaze from egocentric videos serves as a critical role for human intention understanding in daily activities. In this paper, we present the first transformer-based model to address the challenging problem of egocentric gaze estimation. We observe that the connection between the global scene context and local visual information is vital for localizing the gaze fixation from egocentric video frames. To this end, we design the transformer encoder to embed the global context as one additional visual token and further propose a novel global-local correlation module to explicitly model the correlation of the global token and each local token. We validate our model on two egocentric video datasets - EGTEA Gaze + and Ego4D. Our detailed ablation studies demonstrate the benefits of our method. In addition, our approach exceeds the previous state-of-the-art model by a large margin. We also apply our model to a novel gaze saccade/fixation prediction task and the traditional action recognition problem. The consistent gains suggest the strong generalization capability of our model. We also provide additional visualizations to support our claim that global-local correlation serves a key representation for predicting gaze fixation from egocentric videos. More details can be found in our website (https://bolinlai.github.io/GLC-EgoGazeEst).
从自我中心视角视频预测人类注视方向在理解日常活动中的人类意图方面起着关键作用。在本文中,我们提出了首个基于Transformer的模型,以解决自我中心视角注视估计这一具有挑战性的问题。我们观察到,全局场景上下文与局部视觉信息之间的联系对于从自我中心视角视频帧中定位注视点至关重要。为此,我们设计了Transformer编码器将全局上下文嵌入为一个额外的视觉令牌,并进一步提出了一种新颖的全局-局部相关性模块,以明确建模全局令牌与每个局部令牌之间的相关性。我们在两个自我中心视角视频数据集——EGTEA Gaze+和Ego4D上验证了我们的模型。详细的消融研究证明了我们方法的优势。此外,我们的方法大幅超越了先前的最先进模型。我们还将我们的模型应用于一个新颖的注视扫视/注视预测任务以及传统的动作识别问题。一致的提升表明我们的模型具有强大的泛化能力。我们还提供了额外的可视化结果,以支持我们的观点,即全局-局部相关性是从自我中心视角视频预测注视点的关键表示。更多细节可在我们的网站(https://bolinlai.github.io/GLC-EgoGazeEst)上找到。