双焦点3D：一种用于稳健3D注视估计的混合深度学习方法。

Dual Focus-3D: A Hybrid Deep Learning Approach for Robust 3D Gaze Estimation.

作者信息

Bendimered Abderrahmen, Iguernaissi Rabah, Nawaf Mohamad Motasem, Cherif Rim, Dubuisson Séverine, Merad Djamal

机构信息

Laboratoire d'Informatique et des Systèmes, CNRS UMR 7020, Aix-Marseille University, 13009 Marseille, France.

出版信息

Sensors (Basel). 2025 Jun 30;25(13):4086. doi: 10.3390/s25134086.

DOI:10.3390/s25134086

PMID:40648341

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12251888/

Abstract

Estimating gaze direction is a key task in computer vision, especially for understanding where a person is focusing their attention. It is essential for applications in assistive technology, medical diagnostics, virtual environments, and human-computer interaction. In this work, we introduce Dual Focus-3D, a novel hybrid deep learning architecture that combines appearance-based features from eye images with 3D head orientation data. This fusion enhances the model's prediction accuracy and robustness, particularly in challenging natural environments. To support training and evaluation, we present EyeLis, a new dataset containing 5206 annotated samples with corresponding 3D gaze and head pose information. Our model achieves state-of-the-art performance, with a MAE of 1.64° on EyeLis, demonstrating its ability to generalize effectively across both synthetic and real datasets. Key innovations include a multimodal feature fusion strategy, an angular loss function optimized for 3D gaze prediction, and regularization techniques to mitigate overfitting. Our results show that including 3D spatial information directly in the learning process significantly improves accuracy.

摘要

估计注视方向是计算机视觉中的一项关键任务，特别是对于理解一个人的注意力集中在哪里。这对于辅助技术、医学诊断、虚拟环境和人机交互等应用至关重要。在这项工作中，我们引入了Dual Focus-3D，这是一种新颖的混合深度学习架构，它将来自眼睛图像的基于外观的特征与3D头部方向数据相结合。这种融合提高了模型的预测准确性和鲁棒性，特别是在具有挑战性的自然环境中。为了支持训练和评估，我们展示了EyeLis，这是一个新的数据集，包含5206个带注释的样本以及相应的3D注视和头部姿势信息。我们的模型实现了领先的性能，在EyeLis上的平均绝对误差为1.64°，证明了其在合成数据集和真实数据集上有效泛化的能力。关键创新包括多模态特征融合策略、针对3D注视预测优化的角度损失函数以及减轻过拟合的正则化技术。我们的结果表明，在学习过程中直接纳入3D空间信息可显著提高准确性。