McIntosh Lane T, Maheswaranathan Niru, Nayebi Aran, Ganguli Surya, Baccus Stephen A
Neurosciences PhD Program, Stanford University.
Department of Applied Physics, Stanford University.
Adv Neural Inf Process Syst. 2016;29:1369-1377.
A central challenge in sensory neuroscience is to understand neural computations and circuit mechanisms that underlie the encoding of ethologically relevant, natural stimuli. In multilayered neural circuits, nonlinear processes such as synaptic transmission and spiking dynamics present a significant obstacle to the creation of accurate computational models of responses to natural stimuli. Here we demonstrate that deep convolutional neural networks (CNNs) capture retinal responses to natural scenes nearly to within the variability of a cell's response, and are markedly more accurate than linear-nonlinear (LN) models and Generalized Linear Models (GLMs). Moreover, we find two additional surprising properties of CNNs: they are less susceptible to overfitting than their LN counterparts when trained on small amounts of data, and generalize better when tested on stimuli drawn from a different distribution (e.g. between natural scenes and white noise). An examination of the learned CNNs reveals several properties. First, a richer set of feature maps is necessary for predicting the responses to natural scenes compared to white noise. Second, temporally precise responses to slowly varying inputs originate from feedforward inhibition, similar to known retinal mechanisms. Third, the injection of latent noise sources in intermediate layers enables our model to capture the sub-Poisson spiking variability observed in retinal ganglion cells. Fourth, augmenting our CNNs with recurrent lateral connections enables them to capture contrast adaptation as an emergent property of accurately describing retinal responses to natural scenes. These methods can be readily generalized to other sensory modalities and stimulus ensembles. Overall, this work demonstrates that CNNs not only accurately capture sensory circuit responses to natural scenes, but also can yield information about the circuit's internal structure and function.
感觉神经科学中的一个核心挑战是理解神经计算和电路机制,这些机制是对行为学相关的自然刺激进行编码的基础。在多层神经回路中,诸如突触传递和脉冲发放动力学等非线性过程,对创建自然刺激反应的精确计算模型构成了重大障碍。在这里,我们证明深度卷积神经网络(CNN)能够捕捉视网膜对自然场景的反应,其精度几乎达到细胞反应的变异性范围内,并且明显比线性-非线性(LN)模型和广义线性模型(GLM)更准确。此外,我们发现CNN还有另外两个令人惊讶的特性:在少量数据上训练时,它们比LN模型更不易过拟合,并且在对来自不同分布的刺激(例如自然场景和白噪声之间)进行测试时,泛化能力更好。对学习到的CNN进行检查揭示了几个特性。首先,与白噪声相比,预测对自然场景的反应需要更丰富的特征图集合。其次,对缓慢变化输入的时间精确反应源于前馈抑制,这与已知的视网膜机制类似。第三,在中间层注入潜在噪声源使我们的模型能够捕捉视网膜神经节细胞中观察到的亚泊松脉冲发放变异性。第四,通过递归横向连接增强我们的CNN,使它们能够捕捉对比度适应,这是准确描述视网膜对自然场景反应的一个涌现特性。这些方法可以很容易地推广到其他感觉模态和刺激集合。总体而言,这项工作表明CNN不仅能准确捕捉感觉回路对自然场景的反应,还能产生有关回路内部结构和功能的信息。