Department of Psychology, University of Stirling, Scotland, UK.
Br J Psychol. 2010 Feb;101(Pt 1):1-26. doi: 10.1348/000712608X379070. Epub 2009 Feb 14.
Vision research has made very substantial progress towards understanding how we see. It is one area of psychology where the three-way thrust of behavioural measurements (psychophysics), brain imaging, and computational studies have been combined quite routinely for some years. The purpose of this paper is to demonstrate a relatively unusual form of computational modelling that we characterise as involving image descriptions. Image descriptions are statements about structures in images and relationships between structures. Most modelling in vision is either conceived in fairly abstract terms, or is done at the level of images. Neither is entirely satisfactory, and image descriptions are a simple formulation of age-old ideas about a Vocabulary of image features that are detected and parameterized from actual digital images. For our example, we use the domain of the visual perception of printed text. This is an area that has been characterized by thorough, robust psychophysical experiments. The fundamental requirements of visual processing in this domain are: grouping of some parts if the image into words; at the same time segmenting words from each other. We show how these are readily understood in terms of our model of image descriptions, and show quantitatively that typographical practice, refined over centuries, is about optimum for the visual system at least as represented by our model. In addition, we show that the same notion of image descriptions could, in principle, support word recognition in certain circumstances.
视觉研究在理解我们如何看方面已经取得了非常大的进展。它是心理学的一个领域,行为测量(心理物理学)、大脑成像和计算研究的三管齐下已经结合了好几年。本文的目的是展示一种相对不常见的计算建模形式,我们称之为涉及图像描述。图像描述是关于图像中的结构和结构之间关系的陈述。大多数视觉建模要么是在相当抽象的术语中构想的,要么是在图像层面上进行的。两者都不完全令人满意,而图像描述是对从实际数字图像中检测和参数化的图像特征词汇的古老思想的简单表述。对于我们的例子,我们使用印刷文本视觉感知的领域。这是一个经过彻底、稳健的心理物理实验所描述的领域。该领域视觉处理的基本要求是:将图像的某些部分组合成单词;同时将单词彼此分割。我们展示了如何根据我们的图像描述模型来理解这些要求,并且定量地表明,经过数百年的精炼,印刷实践对于我们的模型所代表的视觉系统至少是最优的。此外,我们还表明,在某些情况下,相同的图像描述概念可以支持单词识别。