Departments of Neurological Surgery and Physiology, UCSF Center for Integrative Neuroscience, University of California, San Francisco, California 94143, USA.
Nature. 2012 May 10;485(7397):233-6. doi: 10.1038/nature11020.
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.
人类拥有在多说话者背景下专注于单个说话者声音的非凡能力。听觉系统如何在如此复杂和不利的听觉条件下设法提取可理解的语音尚不清楚,实际上,也不清楚被关注的语音是如何在内部表示的。在这里,我们使用来自参与两个同时说话者的聆听任务的受试者的皮层的多电极表面记录,证明非主要人类听觉皮层中的群体反应编码了被关注的语音的关键特征:基于对混合说话者的皮层反应重建的语音频谱图揭示了被关注的说话者的显著频谱和时间特征,就好像受试者只是在听那个说话者。仅在单个说话者的示例上进行训练的简单分类器可以解码被关注的单词和说话者身份。我们发现,任务表现可以很好地由单电极和群体水平皮层反应中的注意力调节神经选择性的快速增加来预测。这些发现表明,语音的皮层表示不仅仅反映了外部声学环境,而是产生了与听众预期目标相关的感知方面。