Borji Ali, Feng Mengyang, Lu Huchuan
Center for Research in Computer Vision, Department of Computer Science, University of Central Florida, Orlando,
Department of Electrical Engineering, Dalian University of Technology, Dalian,
J Vis. 2016 Nov 1;16(14):18. doi: 10.1167/16.14.18.
Several structural scene cues such as gist, layout, horizontal line, openness, and depth have been shown to guide scene perception (e.g., Oliva & Torralba, 2001); Ross & Oliva, 2009). Here, to investigate whether vanishing point (VP) plays a significant role in gaze guidance, we ran two experiments. In the first one, we recorded fixations of 10 observers (six male, four female; mean age 22; SD = 0.84) freely viewing 532 images, out of which 319 had a VP (shuffled presentation; each image for 4 s). We found that the average number of fixations at a local region (80 × 80 pixels) centered at the VP is significantly higher than the average fixations at random locations (t test; n = 319; p < 0.001). To address the confounding factor of saliency, we learned a combined model of bottom-up saliency and VP. The AUC (area under curve) score of our model (0.85; SD = 0.01) is significantly higher than the base saliency model (e.g., 0.8 using attention for information maximization (AIM) model by Bruce & Tsotsos, 2005, t test; p = 3.14e-16) and the VP-only model (0.64, t test; p < 0.001). In the second experiment, we asked 14 subjects (10 male, four female; mean age 23.07, SD = 1.26) to search for a target character (T or L) placed randomly on a 3 × 3 imaginary grid overlaid on top of an image. Subjects reported their answers by pressing one of the two keys. Stimuli consisted of 270 color images (180 with a single VP, 90 without). The target happened with equal probability inside each cell (15 times L, 15 times T). We found that subjects were significantly faster (and more accurate) when the target appeared inside the cell containing the VP compared to cells without the VP (median across 14 subjects 1.34 s vs. 1.96 s; Wilcoxon rank-sum test; p = 0.0014). These findings support the hypothesis that vanishing point, similar to face, text (Cerf, Frady, & Koch, 2009), and gaze direction Borji, Parks, & Itti, 2014) guides attention in free-viewing and visual search tasks.
一些结构场景线索,如主旨、布局、水平线、开放性和深度,已被证明能引导场景感知(例如,奥利瓦和托拉尔巴,2001年;罗斯和奥利瓦,2009年)。在此,为了研究消失点(VP)在注视引导中是否起重要作用,我们进行了两项实验。在第一个实验中,我们记录了10名观察者(6名男性,4名女性;平均年龄22岁;标准差 = 0.84)自由观看532张图像时的注视情况,其中319张有消失点(随机呈现;每张图像显示4秒)。我们发现,以消失点为中心的局部区域(80×80像素)的平均注视次数显著高于随机位置的平均注视次数(t检验;n = 319;p < 0.001)。为了解决显著性的混杂因素,我们学习了一个自下而上的显著性和消失点的组合模型。我们模型的AUC(曲线下面积)得分(0.85;标准差 = 0.01)显著高于基础显著性模型(例如,使用布鲁斯和乔托索斯2005年的信息最大化注意力(AIM)模型得分为0.8,t检验;p = 3.14e - 16)和仅基于消失点的模型(0.64,t检验;p < 0.001)。在第二个实验中,我们要求14名受试者(10名男性,4名女性;平均年龄23.07岁,标准差 = 1.26)在覆盖在图像上的3×3虚拟网格中搜索随机放置的目标字符(T或L)。受试者通过按下两个键之一来报告他们的答案。刺激物包括270张彩色图像(180张有单个消失点,90张没有)。目标在每个单元格中出现的概率相等(L出现15次,T出现15次)。我们发现,与没有消失点的单元格相比,当目标出现在包含消失点的单元格中时,受试者的速度明显更快(且更准确)(14名受试者的中位数分别为1.34秒和1.96秒;威尔科克森秩和检验;p = 0.0014)。这些发现支持了这样的假设,即消失点与面部、文本(瑟夫、弗雷迪和科赫,2009年)以及注视方向(博尔吉、帕克斯和伊蒂,2014年)类似,在自由观看和视觉搜索任务中引导注意力。