Elazary Lior, Itti Laurent
Department of Computer Science, University of Southern California, Los Angeles, CA, USA.
J Vis. 2008 Mar 7;8(3):3.1-15. doi: 10.1167/8.3.3.
How do we decide which objects in a visual scene are more interesting? While intuition may point toward high-level object recognition and cognitive processes, here we investigate the contributions of a much simpler process, low-level visual saliency. We used the LabelMe database (24,863 photographs with 74,454 manually outlined objects) to evaluate how often interesting objects were among the few most salient locations predicted by a computational model of bottom-up attention. In 43% of all images the model's predicted most salient location falls within a labeled region (chance 21%). Furthermore, in 76% of the images (chance 43%), one or more of the top three salient locations fell on an outlined object, with performance leveling off after six predicted locations. The bottom-up attention model has neither notion of object nor notion of semantic relevance. Hence, our results indicate that selecting interesting objects in a scene is largely constrained by low-level visual properties rather than solely determined by higher cognitive processes.
我们如何确定视觉场景中的哪些物体更具吸引力?虽然直觉可能指向高级物体识别和认知过程,但在这里我们研究一个简单得多的过程——低级视觉显著性——的作用。我们使用LabelMe数据库(24863张照片,带有74454个手动勾勒出的物体)来评估有趣的物体在由自下而上注意力计算模型预测的少数最显著位置中出现的频率。在所有图像的43%中,模型预测的最显著位置落在一个标注区域内(概率为21%)。此外,在76%的图像中(概率为43%),前三个显著位置中的一个或多个落在勾勒出的物体上,在预测六个位置后性能趋于平稳。自下而上的注意力模型既没有物体概念也没有语义相关性概念。因此,我们的结果表明,在场景中选择有趣的物体在很大程度上受低级视觉属性的约束,而不是仅仅由更高层次的认知过程决定。