Ackermann John F, Landy Michael S
Department of Psychology, New York University, New York, NY, USA.
J Vis. 2014 Mar 13;14(3):18. doi: 10.1167/14.3.18.
How do we find a target embedded in a scene? Within the framework of signal detection theory, this task is carried out by comparing each region of the scene with a "template," i.e., an internal representation of the search target. Here we ask what form this representation takes when the search target is a complex image with uncertain orientation. We examine three possible representations. The first is the matched filter. Such a representation cannot account for the ease with which humans can find a complex search target that is rotated relative to the template. A second representation attempts to deal with this by estimating the relative orientation of target and match and rotating the intensity-based template. No intensity-based template, however, can account for the ability to easily locate targets that are defined categorically and not in terms of a specific arrangement of pixels. Thus, we define a third template that represents the target in terms of image statistics rather than pixel intensities. Subjects performed a two-alternative, forced-choice search task in which they had to localize an image that matched a previously viewed target. Target images were texture patches. In one condition, match images were the same image as the target and distractors were a different image of the same textured material. In the second condition, the match image was of the same texture as the target (but different pixels) and the distractor was an image of a different texture. Match and distractor stimuli were randomly rotated relative to the target. We compared human performance to pixel-based, pixel-based with rotation, and statistic-based search models. The statistic-based search model was most successful at matching human performance. We conclude that humans use summary statistics to search for complex visual targets.
我们如何在场景中找到嵌入的目标?在信号检测理论的框架内,这项任务是通过将场景的每个区域与一个“模板”进行比较来完成的,即搜索目标的内部表征。在这里,我们要问当搜索目标是一个方向不确定的复杂图像时,这种表征会采取什么形式。我们研究了三种可能的表征。第一种是匹配滤波器。这样的表征无法解释人类能够轻松找到相对于模板旋转的复杂搜索目标的原因。第二种表征试图通过估计目标与匹配之间的相对方向并旋转基于强度的模板来解决这个问题。然而,没有基于强度的模板能够解释轻松定位按类别定义而非根据特定像素排列定义的目标的能力。因此,我们定义了第三种模板,它根据图像统计信息而非像素强度来表征目标。受试者执行了一项二选一的强制选择搜索任务,在该任务中他们必须定位与先前查看的目标匹配的图像。目标图像是纹理块。在一种情况下,匹配图像与目标相同,干扰项是相同纹理材料的不同图像。在第二种情况下,匹配图像与目标具有相同的纹理(但像素不同),干扰项是不同纹理的图像。匹配和干扰刺激相对于目标随机旋转。我们将人类的表现与基于像素、基于像素并带有旋转以及基于统计的搜索模型进行了比较。基于统计的搜索模型在匹配人类表现方面最为成功。我们得出结论,人类使用概要统计信息来搜索复杂的视觉目标。