Department of Electrical and Computer Engineering, Center for Language and Speech Processing, Johns Hopkins University, Baltimore, Maryland, USA.
PLoS Comput Biol. 2012;8(11):e1002759. doi: 10.1371/journal.pcbi.1002759. Epub 2012 Nov 1.
Timbre is the attribute of sound that allows humans and other animals to distinguish among different sound sources. Studies based on psychophysical judgments of musical timbre, ecological analyses of sound's physical characteristics as well as machine learning approaches have all suggested that timbre is a multifaceted attribute that invokes both spectral and temporal sound features. Here, we explored the neural underpinnings of musical timbre. We used a neuro-computational framework based on spectro-temporal receptive fields, recorded from over a thousand neurons in the mammalian primary auditory cortex as well as from simulated cortical neurons, augmented with a nonlinear classifier. The model was able to perform robust instrument classification irrespective of pitch and playing style, with an accuracy of 98.7%. Using the same front end, the model was also able to reproduce perceptual distance judgments between timbres as perceived by human listeners. The study demonstrates that joint spectro-temporal features, such as those observed in the mammalian primary auditory cortex, are critical to provide the rich-enough representation necessary to account for perceptual judgments of timbre by human listeners, as well as recognition of musical instruments.
音品是使人类和其他动物能够区分不同声源的声音属性。基于对音乐音品的心理物理判断、对声音物理特征的生态分析以及机器学习方法的研究都表明,音品是一种多方面的属性,它既涉及到光谱特征又涉及到时间特征。在这里,我们探讨了音乐音品的神经基础。我们使用了一种基于哺乳动物初级听觉皮层中超过一千个神经元以及模拟皮层神经元的频谱-时间感受野的神经计算框架,并结合了非线性分类器。该模型能够在不考虑音高和演奏风格的情况下,对乐器进行稳健的分类,准确率达到 98.7%。使用相同的前端,该模型还能够再现人类听众感知到的音品之间的感知距离判断。该研究表明,联合的频谱-时间特征,如在哺乳动物初级听觉皮层中观察到的特征,对于提供丰富的表示是至关重要的,这对于解释人类听众对音品的感知判断以及识别乐器都是必要的。