Computational Neurobiology Laboratory and Crick-Jacobs Center for Theoretical and Computational Biology, Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
Neural Comput. 2012 Sep;24(9):2384-421. doi: 10.1162/NECO_a_00330. Epub 2012 Jun 26.
The human visual system is capable of recognizing complex objects even when their appearances change drastically under various viewing conditions. Especially in the higher cortical areas, the sensory neurons reflect such functional capacity in their selectivity for complex visual features and invariance to certain object transformations, such as image translation. Due to the strong nonlinearities necessary to achieve both the selectivity and invariance, characterizing and predicting the response patterns of these neurons represents a formidable computational challenge. A related problem is that such neurons are poorly driven by randomized inputs, such as white noise, and respond strongly only to stimuli with complex high-order correlations, such as natural stimuli. Here we describe a novel two-step optimization technique that can characterize both the shape selectivity and the range and coarseness of position invariance from neural responses to natural stimuli. One step in the optimization is finding the template as the maximally informative dimension given the estimated spatial location where the response could have been triggered within each image. The estimates of the locations that triggered the response are updated in the next step. Under the assumption of a monotonic relationship between the firing rate and stimulus projections on the template at a given position, the most likely location is the one that has the largest projection on the estimate of the template. The algorithm shows quick convergence during optimization, and the estimation results are reliable even in the regime of small signal-to-noise ratios. When we apply the algorithm to responses of complex cells in the primary visual cortex (V1) to natural movies, we find that responses of the majority of cells were significantly better described by translation-invariant models based on one template compared with position-specific models with several relevant features.
人类视觉系统能够识别复杂的物体,即使在各种观察条件下它们的外观发生了巨大的变化。特别是在高级皮质区域,感觉神经元通过对复杂视觉特征的选择性和对某些物体变换(如图像平移)的不变性来反映这种功能能力。由于实现选择性和不变性所需的强烈非线性,这些神经元的特征描述和预测其反应模式代表了一个艰巨的计算挑战。一个相关的问题是,这些神经元很难被随机输入(如白噪声)驱动,只有对具有复杂高阶相关性的刺激(如自然刺激)才会产生强烈反应。在这里,我们描述了一种新颖的两步优化技术,可以从对自然刺激的神经反应中描述形状选择性以及位置不变性的范围和粗糙程度。优化的第一步是找到模板,即给定图像中可能触发响应的估计空间位置下信息量最大的维度。在下一个步骤中更新触发响应的位置的估计。在给定位置的模板上的刺激投影与发射率之间存在单调关系的假设下,最有可能的位置是在模板估计上具有最大投影的位置。该算法在优化过程中快速收敛,即使在小信噪比的情况下,估计结果也是可靠的。当我们将该算法应用于初级视觉皮层(V1)中复杂细胞对自然电影的反应时,我们发现与基于一个模板的平移不变模型相比,大多数细胞的反应用具有几个相关特征的位置特定模型可以得到更好的描述。