Taylor C J, Cootes T F, Lanitis A, Edwards G, Smyth P, Kotcheff A C
Department of Medical Biophysics, University of Manchester, UK.
Philos Trans R Soc Lond B Biol Sci. 1997 Aug 29;352(1358):1267-74. doi: 10.1098/rstb.1997.0109.
The ultimate goal of machine vision is image understanding-the ability not only to recover image structure but also to know what it represents. By definition, this involves the use of models which describe and label the expected structure of the world. Over the past decade, model-based vision has been applied successfully to images of man-made objects. It has proved much more difficult to develop model-based approaches to the interpretation of images of complex and variable structures such as faces or the internal organs of the human body (as visualized in medical images). In such cases it has been problematic even to recover image structure reliably, without a model to organize the often noisy and incomplete image evidence. The key problem is that of variability. To be useful, a model needs to be specific-that is, to be capable of representing only 'legal' examples of the modelled object(s). It has proved difficult to achieve this whilst allowing for natural variability. Recent developments have overcome this problem; it has been shown that specific patterns of variability in shape and grey-level appearance can be captured by statistical models that can be used directly in image interpretation. The details of the approach are outlined and practical examples from medical image interpretation and face recognition are used to illustrate how previously intractable problems can now be tackled successfully. It is also interesting to ask whether these results provide any possible insights into natural vision; for example, we show that the apparent changes in shape which result from viewing three-dimensional objects from different viewpoints can be modelled quite well in two dimensions; this may lend some support to the 'characteristic views' model of natural vision.
机器视觉的最终目标是图像理解,即不仅能够恢复图像结构,还能知道其代表的含义。根据定义,这涉及使用描述和标记世界预期结构的模型。在过去十年中,基于模型的视觉已成功应用于人造物体的图像。然而,要开发基于模型的方法来解释复杂多变结构的图像,如面部或人体内部器官的图像(如在医学图像中可视化的那样),则要困难得多。在这种情况下,即使没有模型来组织通常嘈杂且不完整的图像证据,要可靠地恢复图像结构也存在问题。关键问题在于变异性。为了有用,模型需要具体,也就是说,能够仅表示建模对象的“合法”示例。事实证明,在考虑自然变异性的同时实现这一点很困难。最近的进展克服了这个问题;已经表明,形状和灰度外观的特定变异模式可以通过统计模型捕获,这些模型可直接用于图像解释。本文概述了该方法的细节,并使用医学图像解释和人脸识别的实际示例来说明以前难以解决的问题现在如何能够成功解决。同样有趣的是询问这些结果是否能为自然视觉提供任何可能的见解;例如,我们表明,从不同视角观察三维物体时形状的明显变化可以在二维中很好地建模;这可能为自然视觉的“特征视图”模型提供一些支持。