Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany.
PLoS One. 2011;6(9):e24038. doi: 10.1371/journal.pone.0024038. Epub 2011 Sep 12.
Models of fixation selection are a central tool in the quest to understand how the human mind selects relevant information. Using this tool in the evaluation of competing claims often requires comparing different models' relative performance in predicting eye movements. However, studies use a wide variety of performance measures with markedly different properties, which makes a comparison difficult. We make three main contributions to this line of research: First we argue for a set of desirable properties, review commonly used measures, and conclude that no single measure unites all desirable properties. However the area under the ROC curve (a classification measure) and the KL-divergence (a distance measure of probability distributions) combine many desirable properties and allow a meaningful comparison of critical model performance. We give an analytical proof of the linearity of the ROC measure with respect to averaging over subjects and demonstrate an appropriate correction of entropy-based measures like KL-divergence for small sample sizes in the context of eye-tracking data. Second, we provide a lower bound and an upper bound of these measures, based on image-independent properties of fixation data and between subject consistency respectively. Based on these bounds it is possible to give a reference frame to judge the predictive power of a model of fixation selection. We provide open-source python code to compute the reference frame. Third, we show that the upper, between subject consistency bound holds only for models that predict averages of subject populations. Departing from this we show that incorporating subject-specific viewing behavior can generate predictions which surpass that upper bound. Taken together, these findings lay out the required information that allow a well-founded judgment of the quality of any model of fixation selection and should therefore be reported when a new model is introduced.
固定选择模型是理解人类思维如何选择相关信息的核心工具。在评估相互竞争的主张时,使用该工具通常需要比较不同模型在预测眼球运动方面的相对性能。然而,研究使用了广泛的性能度量,这些度量具有明显不同的特性,这使得比较变得困难。我们在这一研究方向上做出了三个主要贡献:首先,我们提出了一组理想的属性,回顾了常用的度量标准,并得出结论,没有单一的度量标准能够统一所有理想的属性。然而,ROC 曲线下面积(分类度量)和 KL 散度(概率分布的距离度量)结合了许多理想的属性,并允许对关键模型性能进行有意义的比较。我们给出了 ROC 度量相对于对主体进行平均的线性性的理论证明,并在眼动追踪数据的背景下,演示了适用于小样本量的基于熵的度量(如 KL 散度)的适当校正。其次,我们基于固定数据的与图像无关的属性和受试者之间的一致性,分别提供了这些度量的下限和上限。基于这些边界,可以为判断固定选择模型的预测能力提供参考框架。我们提供了计算参考框架的开源 Python 代码。第三,我们表明,上界,受试者间一致性的边界仅适用于预测受试者群体平均值的模型。背离这一点,我们表明纳入特定于受试者的观看行为可以产生超过该上限的预测。总之,这些发现提供了必要的信息,可以对任何固定选择模型的质量进行有根据的判断,因此在引入新模型时应报告这些信息。