单眼场景感知中的表面形成与深度

Vision Sciences Laboratory, Harvard University, Cambridge, MA 02138, USA.

Perception. 1999;28(11):1347-60. doi: 10.1068/p2987.

The visual perception of monocular stimuli perceived as 3-D objects has received considerable attention from researchers in human and machine vision. However, most previous research has focused on how individual 3-D objects are perceived. Here this is extended to a study of how the structure of 3-D scenes containing multiple, possibly disconnected objects and features is perceived. Da Vinci stereopsis, stereo capture, and other surface formation and interpolation phenomena in stereopsis and structure-from-motion suggest that small features having ambiguous depth may be assigned depth by interpolation with features having unambiguous depth. I investigated whether vision may use similar mechanisms to assign relative depth to multiple objects and features in sparse monocular images, such as line drawings, especially when other depth cues are absent. I propose that vision tends to organize disconnected objects and features into common surfaces to construct 3-D-scene interpretations. Interpolations that are too weak to generate a visible surface percept may still be strong enough to assign relative depth to objects within a scene. When there exists more than one possible surface interpolation in a scene, the visual system's preference for one interpolation over another seems to be influenced by a number of factors, including: (i) proximity, (ii) smoothness, (iii) a preference for roughly frontoparallel surfaces and 'ground' surfaces, (iv) attention and fixation, and (v) higher-level factors. I present a variety of demonstrations and an experiment to support this surface-formation hypothesis.

作为三维物体被感知的单眼刺激的视觉感知，已经受到了人类视觉和机器视觉领域研究人员的广泛关注。然而，以往的大多数研究都集中在单个三维物体是如何被感知的。在此，研究范围扩展到对包含多个可能不相连的物体和特征的三维场景结构是如何被感知的研究。达·芬奇立体视、立体捕捉以及立体视觉和运动结构中的其他表面形成和插值现象表明，深度模糊的小特征可能通过与深度明确的特征进行插值来赋予深度。我研究了视觉是否可能使用类似的机制，为诸如线条图等稀疏单眼图像中的多个物体和特征赋予相对深度，特别是在没有其他深度线索的情况下。我提出，视觉倾向于将不相连的物体和特征组织成共同的表面，以构建三维场景的解释。那些强度不足以产生可见表面感知的插值，可能仍然足以赋予场景中物体相对深度。当场景中存在不止一种可能的表面插值时，视觉系统对一种插值相对于另一种插值的偏好似乎受到多种因素的影响，这些因素包括：（i）接近度，（ii）平滑度，（iii）对大致 frontalparallel 表面和“地面”表面的偏好，（iv）注意力和注视，以及（v）更高层次的因素。我展示了各种演示和一个实验来支持这种表面形成假说。