Suppr超能文献

人类大脑如何在说话人变化的情况下识别语音。

How the human brain recognizes speech in the context of changing speakers.

机构信息

Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, United Kingdom.

出版信息

J Neurosci. 2010 Jan 13;30(2):629-38. doi: 10.1523/JNEUROSCI.2742-09.2010.

Abstract

We understand speech from different speakers with ease, whereas artificial speech recognition systems struggle with this task. It is unclear how the human brain solves this problem. The conventional view is that speech message recognition and speaker identification are two separate functions and that message processing takes place predominantly in the left hemisphere, whereas processing of speaker-specific information is located in the right hemisphere. Here, we distinguish the contribution of specific cortical regions, to speech recognition and speaker information processing, by controlled manipulation of task and resynthesized speaker parameters. Two functional magnetic resonance imaging studies provide evidence for a dynamic speech-processing network that questions the conventional view. We found that speech recognition regions in left posterior superior temporal gyrus/superior temporal sulcus (STG/STS) also encode speaker-related vocal tract parameters, which are reflected in the amplitude peaks of the speech spectrum, along with the speech message. Right posterior STG/STS activated specifically more to a speaker-related vocal tract parameter change during a speech recognition task compared with a voice recognition task. Left and right posterior STG/STS were functionally connected. Additionally, we found that speaker-related glottal fold parameters (e.g., pitch), which are not reflected in the amplitude peaks of the speech spectrum, are processed in areas immediately adjacent to primary auditory cortex, i.e., in areas in the auditory hierarchy earlier than STG/STS. Our results point to a network account of speech recognition, in which information about the speech message and the speaker's vocal tract are combined to solve the difficult task of understanding speech from different speakers.

摘要

我们可以轻松理解来自不同说话者的语音,而人工语音识别系统在这方面却很吃力。目前尚不清楚大脑是如何解决这个问题的。传统观点认为,语音信息识别和说话人识别是两个独立的功能,信息处理主要发生在左半球,而说话人特定信息的处理则位于右半球。在这里,我们通过对任务和重新合成的说话人参数的控制操作,区分了特定皮质区域在语音识别和说话人信息处理方面的贡献。两项功能性磁共振成像研究为质疑传统观点的动态语音处理网络提供了证据。我们发现,左后颞上回/颞上沟(STG/STS)中的语音识别区域也编码与说话人相关的声道参数,这些参数反映在语音频谱的幅度峰值中,与语音信息一起。与语音识别任务相比,在语音识别任务中,右后 STG/STS 区域专门对与说话人相关的声道参数变化有更多的激活。左、右后 STG/STS 区域具有功能连接。此外,我们发现,与说话人相关的声门褶皱参数(如音高),这些参数不反映在语音频谱的幅度峰值中,在紧邻初级听觉皮层的区域(即听觉层次结构中比 STG/STS 更早的区域)中得到处理。我们的研究结果表明,语音识别是一个网络模型,其中关于语音信息和说话人声道的信息被结合起来,以解决理解来自不同说话者的语音这一难题。

相似文献

1
How the human brain recognizes speech in the context of changing speakers.
J Neurosci. 2010 Jan 13;30(2):629-38. doi: 10.1523/JNEUROSCI.2742-09.2010.
2
A neural mechanism for recognizing speech spoken by different speakers.
Neuroimage. 2014 May 1;91:375-85. doi: 10.1016/j.neuroimage.2014.01.005. Epub 2014 Jan 13.
3
Task-dependent decoding of speaker and vowel identity from auditory cortical response patterns.
J Neurosci. 2014 Mar 26;34(13):4548-57. doi: 10.1523/JNEUROSCI.4339-13.2014.
4
A multisensory cortical network for understanding speech in noise.
J Cogn Neurosci. 2009 Sep;21(9):1790-805. doi: 10.1162/jocn.2009.21118.
5
Stimulus-dependent activations and attention-related modulations in the auditory cortex: a meta-analysis of fMRI studies.
Hear Res. 2014 Jan;307:29-41. doi: 10.1016/j.heares.2013.08.001. Epub 2013 Aug 11.
6
Distinct functional substrates along the right superior temporal sulcus for the processing of voices.
Neuroimage. 2004 Jun;22(2):948-55. doi: 10.1016/j.neuroimage.2004.02.020.
8
Speech comprehension aided by multiple modalities: behavioural and neural interactions.
Neuropsychologia. 2012 Apr;50(5):762-76. doi: 10.1016/j.neuropsychologia.2012.01.010. Epub 2012 Jan 17.
9
Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.
Neuroimage. 2014 Nov 15;102 Pt 2:332-44. doi: 10.1016/j.neuroimage.2014.07.038. Epub 2014 Aug 1.
10
The neural correlate of speech rhythm as evidenced by metrical speech processing.
J Cogn Neurosci. 2008 Mar;20(3):541-52. doi: 10.1162/jocn.2008.20029.

引用本文的文献

1
Neural representations of naturalistic person identities while watching a feature film.
Imaging Neurosci (Camb). 2023 Aug 21;1. doi: 10.1162/imag_a_00009. eCollection 2023.
2
Perceiving speech from a familiar speaker engages the person identity network.
PLoS One. 2025 May 14;20(5):e0322927. doi: 10.1371/journal.pone.0322927. eCollection 2025.
3
A hierarchy of processing complexity and timescales for natural sounds in the human auditory cortex.
Proc Natl Acad Sci U S A. 2025 May 6;122(18):e2412243122. doi: 10.1073/pnas.2412243122. Epub 2025 Apr 28.
4
Mediterranean diet and brain functional connectivity in a population without dementia.
Front Neuroimaging. 2024 Dec 6;3:1473399. doi: 10.3389/fnimg.2024.1473399. eCollection 2024.
5
Prior multisensory learning can facilitate auditory-only voice-identity and speech recognition in noise.
Q J Exp Psychol (Hove). 2024 Sep 20;78(7):17470218241278649. doi: 10.1177/17470218241278649.
6
A hierarchy of processing complexity and timescales for natural sounds in human auditory cortex.
bioRxiv. 2024 May 26:2024.05.24.595822. doi: 10.1101/2024.05.24.595822.
7
Right Posterior Temporal Cortex Supports Integration of Phonetic and Talker Information.
Neurobiol Lang (Camb). 2023 Mar 8;4(1):145-177. doi: 10.1162/nol_a_00091. eCollection 2023.
8
The Role of the Right Hemisphere in Processing Phonetic Variability Between Talkers.
Neurobiol Lang (Camb). 2021 Feb 1;2(1):138-151. doi: 10.1162/nol_a_00028. eCollection 2021.
9
Using TMS to evaluate a causal role for right posterior temporal cortex in talker-specific phonetic processing.
Brain Lang. 2023 May;240:105264. doi: 10.1016/j.bandl.2023.105264. Epub 2023 Apr 21.
10
Predictive encoding of pure tones and FM-sweeps in the human auditory cortex.
Cereb Cortex Commun. 2022 Nov 16;3(4):tgac047. doi: 10.1093/texcom/tgac047. eCollection 2022.

本文引用的文献

1
Recognizing sequences of sequences.
PLoS Comput Biol. 2009 Aug;5(8):e1000464. doi: 10.1371/journal.pcbi.1000464. Epub 2009 Aug 14.
2
On-line plasticity in spoken sentence comprehension: Adapting to time-compressed speech.
Neuroimage. 2010 Jan 1;49(1):1124-32. doi: 10.1016/j.neuroimage.2009.07.032. Epub 2009 Jul 24.
4
Understanding pitch perception as a hierarchical process with top-down modulation.
PLoS Comput Biol. 2009 Mar;5(3):e1000301. doi: 10.1371/journal.pcbi.1000301. Epub 2009 Mar 6.
6
Interdependent encoding of pitch, timbre, and spatial location in auditory cortex.
J Neurosci. 2009 Feb 18;29(7):2064-75. doi: 10.1523/JNEUROSCI.4755-08.2009.
8
Pre-lexical abstraction of speech in the auditory cortex.
Trends Cogn Sci. 2009 Jan;13(1):14-9. doi: 10.1016/j.tics.2008.09.005. Epub 2008 Dec 11.
9
Encoding of spectral correlation over time in auditory cortex.
J Neurosci. 2008 Dec 3;28(49):13268-73. doi: 10.1523/JNEUROSCI.4596-08.2008.
10
The cortical dynamics of intelligible speech.
J Neurosci. 2008 Dec 3;28(49):13209-15. doi: 10.1523/JNEUROSCI.2903-08.2008.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验