Department of Computer Science and Technology, Tsinghua University, Beijing, China.
Center for Brain-Inspired Computing Research (CBICR), Tsinghua University, Beijing, China.
PLoS Comput Biol. 2019 Feb 11;15(2):e1006766. doi: 10.1371/journal.pcbi.1006766. eCollection 2019 Feb.
The auditory pathway consists of multiple stages, from the cochlear nucleus to the auditory cortex. Neurons acting at different stages have different functions and exhibit different response properties. It is unclear whether these stages share a common encoding mechanism. We trained an unsupervised deep learning model consisting of alternating sparse coding and max pooling layers on cochleogram-filtered human speech. Evaluation of the response properties revealed that computing units in lower layers exhibited spectro-temporal receptive fields (STRFs) similar to those of inferior colliculus neurons measured in physiological experiments, including properties such as sound onset and termination, checkerboard pattern, and spectral motion. Units in upper layers tended to be tuned to phonetic features such as plosivity and nasality, resembling the results of field recording in human auditory cortex. Variation of the sparseness level of the units in each higher layer revealed a positive correlation between the sparseness level and the strength of phonetic feature encoding. The activities of the units in the top layer, but not other layers, correlated with the dynamics of the first two formants (F1, F2) of all phonemes, indicating the encoding of phoneme dynamics in these units. These results suggest that the principles of sparse coding and max pooling may be universal in the human auditory pathway.
听觉通路由多个阶段组成,从耳蜗核到听觉皮层。在不同阶段起作用的神经元具有不同的功能,并表现出不同的反应特性。目前尚不清楚这些阶段是否共享共同的编码机制。我们在经过耳蜗滤波的人类语音上训练了一个由交替稀疏编码和最大池化层组成的无监督深度学习模型。对响应特性的评估表明,较低层的计算单元表现出类似于生理实验中测量的下丘神经元的时频谱接收域 (STRF),包括声音起始和终止、棋盘格图案和频谱运动等特性。较高层的单元往往对语音特征(如爆发性和鼻音)进行调谐,类似于人类听觉皮层的现场记录结果。在每个较高层的单元稀疏水平的变化中,发现单元的稀疏水平与语音特征编码的强度之间存在正相关。顶层单元的活动,但不是其他层的活动,与所有音素的前两个共振峰(F1、F2)的动态相关,这表明这些单元对音素动态进行了编码。这些结果表明,稀疏编码和最大池化的原则可能在人类听觉通路上具有普遍性。