Suppr超能文献

面向基于外观的注视估计中的高性能低复杂度校准

Towards High Performance Low Complexity Calibration in Appearance Based Gaze Estimation.

作者信息

Chen Zhaokang, Shi Bertram E

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):1174-1188. doi: 10.1109/TPAMI.2022.3148386. Epub 2022 Dec 5.

Abstract

Appearance-based gaze estimation from RGB images provides relatively unconstrained gaze tracking from commonly available hardware. The accuracy of subject-independent models is limited partly by small intra-subject and large inter-subject variations in appearance, and partly by a latent subject-dependent bias. To improve estimation accuracy, we have previously proposed a gaze decomposition method that decomposes the gaze angle into the sum of a subject-independent gaze estimate from the image and a subject-dependent bias. Estimating the bias from images outperforms previously proposed calibration algorithms, unless the amount of calibration data is prohibitively large. This paper extends that work with a more complete characterization of the interplay between the complexity of the calibration dataset and estimation accuracy. In particular, we analyze the effect of the number of gaze targets, the number of images used per gaze target and the number of head positions in calibration data using a new NISLGaze dataset, which is well suited for analyzing these effects as it includes more diversity in head positions and orientations for each subject than other datasets. A better understanding of these factors enables low complexity high performance calibration. Our results indicate that using only a single gaze target and single head position is sufficient to achieve high quality calibration. However, it is useful to include variability in head orientation as the subject is gazing at the target. Our proposed estimator based on these studies (GEDDNet) outperforms state-of-the-art methods by more than 6.3%. One of the surprising findings of our work is that the same estimator yields the best performance both with and without calibration. This is convenient, as the estimator works well "straight out of the box," but can be improved if needed by calibration. However, this seems to violate the conventional wisdom that train and test conditions must be matched. To better understand the reasons, we provide a new theoretical analysis that specifies the conditions under which this can be expected. The dataset is available at http://nislgaze.ust.hk. Source code is available at https://github.com/HKUST-NISL/GEDDnet.

摘要

基于RGB图像的外观式注视估计可通过常用硬件实现相对不受约束的注视跟踪。独立于主体的模型的准确性部分受到主体内部外观的微小变化和主体间外观的较大变化的限制,部分受到潜在的主体相关偏差的限制。为了提高估计准确性,我们之前提出了一种注视分解方法,该方法将注视角度分解为来自图像的独立于主体的注视估计与主体相关偏差之和。从图像中估计偏差的性能优于之前提出的校准算法,除非校准数据量极大。本文通过更全面地描述校准数据集的复杂性与估计准确性之间的相互作用来扩展这项工作。具体而言,我们使用一个新的NISLGaze数据集来分析注视目标数量、每个注视目标使用的图像数量以及校准数据中的头部位置数量的影响,该数据集非常适合分析这些影响,因为它为每个主体在头部位置和方向上比其他数据集包含更多样性。更好地理解这些因素能够实现低复杂度的高性能校准。我们的结果表明,仅使用单个注视目标和单个头部位置就足以实现高质量校准。然而,当主体注视目标时,纳入头部方向的变化是有用的。基于这些研究我们提出的估计器(GEDDNet)比现有方法的性能高出6.3%以上。我们工作中一个令人惊讶的发现是,同一个估计器在有校准和无校准的情况下都能产生最佳性能。这很方便,因为该估计器“开箱即用”就能很好地工作,但如果需要,也可以通过校准进行改进。然而,这似乎违背了训练和测试条件必须匹配的传统观念。为了更好地理解原因,我们提供了一种新的理论分析,明确了在何种条件下可以预期会出现这种情况。数据集可在http://nislgaze.ust.hk获取。源代码可在https://github.com/HKUST-NISL/GEDDnet获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验