Research Center Neurosensory Science, Cluster of Excellence Hearing4all, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.
Zalando Research, Zalando SE, Berlin, Germany.
PLoS Comput Biol. 2019 Jan 17;15(1):e1006595. doi: 10.1371/journal.pcbi.1006595. eCollection 2019 Jan.
We investigate how the neural processing in auditory cortex is shaped by the statistics of natural sounds. Hypothesising that auditory cortex (A1) represents the structural primitives out of which sounds are composed, we employ a statistical model to extract such components. The input to the model are cochleagrams which approximate the non-linear transformations a sound undergoes from the outer ear, through the cochlea to the auditory nerve. Cochleagram components do not superimpose linearly, but rather according to a rule which can be approximated using the max function. This is a consequence of the compression inherent in the cochleagram and the sparsity of natural sounds. Furthermore, cochleagrams do not have negative values. Cochleagrams are therefore not matched well by the assumptions of standard linear approaches such as sparse coding or ICA. We therefore consider a new encoding approach for natural sounds, which combines a model of early auditory processing with maximal causes analysis (MCA), a sparse coding model which captures both the non-linear combination rule and non-negativity of the data. An efficient truncated EM algorithm is used to fit the MCA model to cochleagram data. We characterize the generative fields (GFs) inferred by MCA with respect to in vivo neural responses in A1 by applying reverse correlation to estimate spectro-temporal receptive fields (STRFs) implied by the learned GFs. Despite the GFs being non-negative, the STRF estimates are found to contain both positive and negative subfields, where the negative subfields can be attributed to explaining away effects as captured by the applied inference method. A direct comparison with ferret A1 shows many similar forms, and the spectral and temporal modulation tuning of both ferret and model STRFs show similar ranges over the population. In summary, our model represents an alternative to linear approaches for biological auditory encoding while it captures salient data properties and links inhibitory subfields to explaining away effects.
我们研究了听觉皮层中的神经处理是如何受到自然声音统计数据的影响的。假设听觉皮层(A1)表示声音组成的结构基元,我们采用了一个统计模型来提取这些基元。模型的输入是耳蜗图,它近似于声音从外耳经过耳蜗到听神经所经历的非线性变换。耳蜗图成分不会线性叠加,而是根据一个可以用最大值函数来近似的规则进行叠加。这是耳蜗图中的压缩和自然声音的稀疏性所导致的结果。此外,耳蜗图没有负值。因此,耳蜗图与标准线性方法(如稀疏编码或 ICA)的假设不太匹配。因此,我们考虑了一种新的自然声音编码方法,该方法结合了早期听觉处理模型和最大因果分析(MCA),MCA 是一种稀疏编码模型,它同时捕捉了数据的非线性组合规则和非负性。我们使用高效截断的 EM 算法将 MCA 模型拟合到耳蜗图数据中。我们通过应用反向相关来估计学习到的 GFs 所隐含的时频谱响应(STRFs),从而从 MCA 推断出的生成场(GFs)的特征来描述与 A1 中体内神经反应的关系。尽管 GFs 是非负的,但 STRF 估计值包含正和负子场,其中负子场可以归因于应用的推断方法所捕获的解释消除效应。与雪貂 A1 的直接比较显示出许多相似的形式,并且雪貂和模型 STRF 的频谱和时间调制调谐都显示出相似的群体范围。总之,我们的模型代表了一种替代线性方法的生物听觉编码方法,同时它捕捉了显著的数据特性,并将抑制子场与解释消除效应联系起来。