Suppr超能文献

人类大脑如何在说话人变化的情况下识别语音。

How the human brain recognizes speech in the context of changing speakers.

机构信息

Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, United Kingdom.

出版信息

J Neurosci. 2010 Jan 13;30(2):629-38. doi: 10.1523/JNEUROSCI.2742-09.2010.

Abstract

We understand speech from different speakers with ease, whereas artificial speech recognition systems struggle with this task. It is unclear how the human brain solves this problem. The conventional view is that speech message recognition and speaker identification are two separate functions and that message processing takes place predominantly in the left hemisphere, whereas processing of speaker-specific information is located in the right hemisphere. Here, we distinguish the contribution of specific cortical regions, to speech recognition and speaker information processing, by controlled manipulation of task and resynthesized speaker parameters. Two functional magnetic resonance imaging studies provide evidence for a dynamic speech-processing network that questions the conventional view. We found that speech recognition regions in left posterior superior temporal gyrus/superior temporal sulcus (STG/STS) also encode speaker-related vocal tract parameters, which are reflected in the amplitude peaks of the speech spectrum, along with the speech message. Right posterior STG/STS activated specifically more to a speaker-related vocal tract parameter change during a speech recognition task compared with a voice recognition task. Left and right posterior STG/STS were functionally connected. Additionally, we found that speaker-related glottal fold parameters (e.g., pitch), which are not reflected in the amplitude peaks of the speech spectrum, are processed in areas immediately adjacent to primary auditory cortex, i.e., in areas in the auditory hierarchy earlier than STG/STS. Our results point to a network account of speech recognition, in which information about the speech message and the speaker's vocal tract are combined to solve the difficult task of understanding speech from different speakers.

摘要

我们可以轻松理解来自不同说话者的语音,而人工语音识别系统在这方面却很吃力。目前尚不清楚大脑是如何解决这个问题的。传统观点认为,语音信息识别和说话人识别是两个独立的功能,信息处理主要发生在左半球,而说话人特定信息的处理则位于右半球。在这里,我们通过对任务和重新合成的说话人参数的控制操作,区分了特定皮质区域在语音识别和说话人信息处理方面的贡献。两项功能性磁共振成像研究为质疑传统观点的动态语音处理网络提供了证据。我们发现,左后颞上回/颞上沟(STG/STS)中的语音识别区域也编码与说话人相关的声道参数,这些参数反映在语音频谱的幅度峰值中,与语音信息一起。与语音识别任务相比,在语音识别任务中,右后 STG/STS 区域专门对与说话人相关的声道参数变化有更多的激活。左、右后 STG/STS 区域具有功能连接。此外,我们发现,与说话人相关的声门褶皱参数(如音高),这些参数不反映在语音频谱的幅度峰值中,在紧邻初级听觉皮层的区域(即听觉层次结构中比 STG/STS 更早的区域)中得到处理。我们的研究结果表明,语音识别是一个网络模型,其中关于语音信息和说话人声道的信息被结合起来,以解决理解来自不同说话者的语音这一难题。

相似文献

2
A neural mechanism for recognizing speech spoken by different speakers.一种识别不同说话者语音的神经机制。
Neuroimage. 2014 May 1;91:375-85. doi: 10.1016/j.neuroimage.2014.01.005. Epub 2014 Jan 13.
8
Speech comprehension aided by multiple modalities: behavioural and neural interactions.多模态辅助言语理解:行为和神经的相互作用。
Neuropsychologia. 2012 Apr;50(5):762-76. doi: 10.1016/j.neuropsychologia.2012.01.010. Epub 2012 Jan 17.

引用本文的文献

10
Predictive encoding of pure tones and FM-sweeps in the human auditory cortex.人类听觉皮层中纯音和调频扫描的预测编码。
Cereb Cortex Commun. 2022 Nov 16;3(4):tgac047. doi: 10.1093/texcom/tgac047. eCollection 2022.

本文引用的文献

1
Recognizing sequences of sequences.识别序列的序列。
PLoS Comput Biol. 2009 Aug;5(8):e1000464. doi: 10.1371/journal.pcbi.1000464. Epub 2009 Aug 14.
8
Pre-lexical abstraction of speech in the auditory cortex.听觉皮层中言语的词汇前抽象
Trends Cogn Sci. 2009 Jan;13(1):14-9. doi: 10.1016/j.tics.2008.09.005. Epub 2008 Dec 11.
10
The cortical dynamics of intelligible speech.可理解语音的皮层动力学
J Neurosci. 2008 Dec 3;28(49):13209-15. doi: 10.1523/JNEUROSCI.2903-08.2008.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验