从全局图像特征估计场景布局属性的感知。

Estimating perception of scene layout properties from global image features.

作者信息

Ross Michael G, Oliva Aude

机构信息

Department of Brain & Cognitive Sciences, Massachusetts Institute of Technology, USA.

出版信息

J Vis. 2010 Jan 8;10(1):2.1-25. doi: 10.1167/10.1.2.

DOI:10.1167/10.1.2

PMID:20143895

Abstract

The relationship between image features and scene structure is central to the study of human visual perception and computer vision, but many of the specifics of real-world layout perception remain unknown. We do not know which image features are relevant to perceiving layout properties, or whether those features provide the same information for every type of image. Furthermore, we do not know the spatial resolutions required for perceiving different properties. This paper describes an experiment and a computational model that provides new insights on these issues. Humans perceive the global spatial layout properties such as dominant depth, openness, and perspective, from a single image. This work describes an algorithm that reliably predicts human layout judgments. This model's predictions are general, not specific to the observers it trained on. Analysis reveals that the optimal spatial resolutions for determining layout vary with the content of the space and the property being estimated. Openness is best estimated at high resolution, depth is best estimated at medium resolution, and perspective is best estimated at low resolution. Given the reliability and simplicity of estimating the global layout of real-world environments, this model could help resolve perceptual ambiguities encountered by more detailed scene reconstruction schemas.

摘要

图像特征与场景结构之间的关系是人类视觉感知和计算机视觉研究的核心，但现实世界布局感知的许多具体细节仍不为人知。我们不知道哪些图像特征与感知布局属性相关，也不知道这些特征是否为每种类型的图像提供相同的信息。此外，我们不知道感知不同属性所需的空间分辨率。本文描述了一项实验和一个计算模型，它们为这些问题提供了新的见解。人类从单张图像中感知诸如主导深度、开放性和透视等全局空间布局属性。这项工作描述了一种可靠预测人类布局判断的算法。该模型的预测具有普遍性，并非特定于其训练所依据的观察者。分析表明，确定布局的最佳空间分辨率会因空间内容和所估计的属性而异。开放性在高分辨率下估计最佳，深度在中等分辨率下估计最佳，透视在低分辨率下估计最佳。鉴于估计现实世界环境全局布局的可靠性和简易性，该模型有助于解决更详细的场景重建模式所遇到的感知模糊性问题。