Olshausen B A, Field D J
Department of Psychology, Cornell University, Ithaca, New York 14853, USA.
Nature. 1996 Jun 13;381(6583):607-9. doi: 10.1038/381607a0.
The receptive fields of simple cells in mammalian primary visual cortex can be characterized as being spatially localized, oriented and bandpass (selective to structure at different spatial scales), comparable to the basis functions of wavelet transforms. One approach to understanding such response properties of visual neurons has been to consider their relationship to the statistical structure of natural images in terms of efficient coding. Along these lines, a number of studies have attempted to train unsupervised learning algorithms on natural images in the hope of developing receptive fields with similar properties, but none has succeeded in producing a full set that spans the image space and contains all three of the above properties. Here we investigate the proposal that a coding strategy that maximizes sparseness is sufficient to account for these properties. We show that a learning algorithm that attempts to find sparse linear codes for natural scenes will develop a complete family of localized, oriented, bandpass receptive fields, similar to those found in the primary visual cortex. The resulting sparse image code provides a more efficient representation for later stages of processing because it possesses a higher degree of statistical independence among its outputs.
哺乳动物初级视觉皮层中简单细胞的感受野具有空间局部化、有方向且带通(对不同空间尺度的结构有选择性)的特征,这与小波变换的基函数类似。理解视觉神经元这种反应特性的一种方法是从高效编码的角度考虑它们与自然图像统计结构的关系。沿着这些思路,许多研究试图在自然图像上训练无监督学习算法,以期开发出具有类似特性的感受野,但没有一项研究成功生成一组能覆盖图像空间并包含上述所有三个特性的完整感受野。在此,我们研究了一种观点,即最大化稀疏性的编码策略足以解释这些特性。我们表明,一种试图为自然场景找到稀疏线性编码的学习算法将开发出一个完整的局部化、有方向、带通感受野族,类似于在初级视觉皮层中发现的那些。由此产生的稀疏图像编码为后续处理阶段提供了更高效的表示,因为其输出之间具有更高程度的统计独立性。