Młynarski Wiktor, McDermott Josh H
Department of Brain and Cognitive Sciences, MIT, Cambridge, MA
Neural Comput. 2018 Mar;30(3):631-669. doi: 10.1162/neco_a_01048. Epub 2017 Dec 8.
Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low-level patterns are combined into more complex structures. To gain insight into such midlevel representations for sound, we designed a hierarchical generative model of natural sounds that learns combinations of spectrotemporal features from natural stimulus statistics. In the first layer, the model forms a sparse convolutional code of spectrograms using a dictionary of learned spectrotemporal kernels. To generalize from specific kernel activation patterns, the second layer encodes patterns of time-varying magnitude of multiple first-layer coefficients. When trained on corpora of speech and environmental sounds, some second-layer units learned to group similar spectrotemporal features. Others instantiate opponency between distinct sets of features. Such groupings might be instantiated by neurons in the auditory cortex, providing a hypothesis for midlevel neuronal computation.
与世界的互动要求生物体将感官信号转化为表征,在这些表征中,环境中具有行为意义的属性得以明确呈现。这些表征是通过神经元处理阶段的级联推导出来的,其中每个阶段的神经元都会对前一阶段的输出进行重新编码。因此,对感官编码的解释可能涉及理解低级模式是如何组合成更复杂的结构的。为了深入了解声音的这种中级表征,我们设计了一种自然声音的分层生成模型,该模型从自然刺激统计数据中学习频谱时间特征的组合。在第一层,模型使用一组学习到的频谱时间核构建频谱图的稀疏卷积码。为了从特定的核激活模式进行泛化,第二层对多个第一层系数的时变幅度模式进行编码。当在语音和环境声音语料库上进行训练时,一些第二层单元学会了对相似的频谱时间特征进行分组。其他单元则实例化了不同特征集之间的对立关系。这种分组可能由听觉皮层中的神经元实例化,为中级神经元计算提供了一个假设。