Buhmann J M, Malik J, Perona P
Institut für Informatik, Universität Bonn, 53117 Bonn, Germany.
Proc Natl Acad Sci U S A. 1999 Dec 7;96(25):14203-4. doi: 10.1073/pnas.96.25.14203.
Vision extracts useful information from images. Reconstructing the three-dimensional structure of our environment and recognizing the objects that populate it are among the most important functions of our visual system. Computer vision researchers study the computational principles of vision and aim at designing algorithms that reproduce these functions. Vision is difficult: the same scene may give rise to very different images depending on illumination and viewpoint. Typically, an astronomical number of hypotheses exist that in principle have to be analyzed to infer a correct scene description. Moreover, image information might be extracted at different levels of spatial and logical resolution dependent on the image processing task. Knowledge of the world allows the visual system to limit the amount of ambiguity and to greatly simplify visual computations. We discuss how simple properties of the world are captured by the Gestalt rules of grouping, how the visual system may learn and organize models of objects for recognition, and how one may control the complexity of the description that the visual system computes.
视觉从图像中提取有用信息。重建我们周围环境的三维结构并识别其中的物体是我们视觉系统最重要的功能之一。计算机视觉研究人员研究视觉的计算原理,旨在设计能够重现这些功能的算法。视觉是困难的:根据光照和视角的不同,相同的场景可能会产生非常不同的图像。通常,原则上必须分析天文数字般的假设才能推断出正确的场景描述。此外,根据图像处理任务的不同,图像信息可能会在不同的空间和逻辑分辨率级别上被提取。对世界的了解使视觉系统能够限制模糊性的数量,并大大简化视觉计算。我们将讨论格式塔分组规则如何捕捉世界的简单属性,视觉系统如何学习和组织用于识别的物体模型,以及如何控制视觉系统计算的描述的复杂性。