Daugman J G, Downing C J
Computer Laboratory, University of Cambridge, England.
J Opt Soc Am A Opt Image Sci Vis. 1995 Apr;12(4):641-60. doi: 10.1364/josaa.12.000641.
We argue that some aspects of human spatial vision, particularly for textured patterns and scenes, can be described in terms of demodulation and predictive coding. Such nonlinear processes encode a pattern into local phasors that represent it completely as a modulation, in phase and amplitude, of a prediction associated with the image structure in some region by its predominant undulation(s). The demodulation representation of a pattern is an anisotropic, second-order form of predictive coding, and it offers a particularly efficient way to analyze and encode textures, as it identifies and exploits their underlying redundancies. In addition, self-consistent domains of redundancy in image structure provide a basis for image segmentation. We first provide an algorithm for computing the three elements of a complete demodulation transform of any image, and we illustrate such decompositions for both natural and synthetic images. We then present psychophysical evidence from spatial masking experiments, as well as illustrations of perceptual organization, that suggest a possible role for such underlying representations in human vision. In psychophysical experiments employing masks with more than two oriented Fourier components, we find that peaks of threshold elevation occur at locations in the Fourier plane remote from the orientations and frequencies of the actual mask components. Rather, as would occur from demodulation, these peaks in the frequency plane are related to the vector difference frequencies between the actual masking components and their spectral centers of mass. We offer a neural interpretation of demodulation coding, and finally we demonstrate a practical application of this process in a system for automatic visual recognition of personal identity by demodulation of a facial feature.
我们认为,人类空间视觉的某些方面,特别是对于有纹理的图案和场景,可以用解调与预测编码来描述。这种非线性过程将图案编码为局部相位矢量,这些矢量通过其主要波动,将图案完全表示为与某个区域图像结构相关的预测在相位和幅度上的调制。图案的解调表示是一种各向异性的、二阶形式的预测编码,它提供了一种特别有效的方法来分析和编码纹理,因为它能识别并利用纹理潜在的冗余性。此外,图像结构中冗余性的自洽域为图像分割提供了基础。我们首先提供一种算法,用于计算任何图像完整解调变换的三个要素,并展示自然图像和合成图像的此类分解。然后,我们展示空间掩蔽实验的心理物理学证据以及知觉组织的示例,这些证据和示例表明这种潜在表示在人类视觉中可能发挥的作用。在使用具有两个以上定向傅里叶分量的掩蔽的心理物理学实验中,我们发现阈值升高的峰值出现在傅里叶平面中远离实际掩蔽分量的方向和频率的位置。相反,正如解调所发生的那样,频率平面中的这些峰值与实际掩蔽分量与其频谱质心之间的矢量差频率有关。我们提供了解调编码的神经学解释,最后展示了这一过程在通过面部特征解调进行个人身份自动视觉识别系统中的实际应用。