Ehinger Krista A, Hidalgo-Sotelo Barbara, Torralba Antonio, Oliva Aude
Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology.
Vis cogn. 2009 Aug 1;17(6-7):945-978. doi: 10.1080/13506280902834720.
How predictable are human eye movements during search in real world scenes? We recorded 14 observers' eye movements as they performed a search task (person detection) in 912 outdoor scenes. Observers were highly consistent in the regions fixated during search, even when the target was absent from the scene. These eye movements were used to evaluate computational models of search guidance from three sources: saliency, target features, and scene context. Each of these models independently outperformed a cross-image control in predicting human fixations. Models that combined sources of guidance ultimately predicted 94% of human agreement, with the scene context component providing the most explanatory power. None of the models, however, could reach the precision and fidelity of an attentional map defined by human fixations. This work puts forth a benchmark for computational models of search in real world scenes. Further improvements in modeling should capture mechanisms underlying the selectivity of observer's fixations during search.
在现实世界场景中进行搜索时,人类的眼球运动有多可预测?我们记录了14名观察者在912个户外场景中执行搜索任务(人物检测)时的眼球运动。即使场景中没有目标,观察者在搜索过程中注视的区域也高度一致。这些眼球运动被用于评估来自三个来源的搜索引导计算模型:显著性、目标特征和场景上下文。在预测人类注视点方面,这些模型中的每一个都独立地优于跨图像控制模型。结合引导来源的模型最终预测出了94%的人类一致性,其中场景上下文部分提供了最大的解释力。然而,没有一个模型能够达到由人类注视定义的注意力地图的精度和保真度。这项工作为现实世界场景中的搜索计算模型提出了一个基准。建模方面的进一步改进应该捕捉到观察者在搜索过程中注视选择性背后的机制。