人类大脑中会说话面孔的模拟可提高听觉语音识别能力。

Simulation of talking faces in the human brain improves auditory speech recognition.

作者信息

von Kriegstein Katharina, Dogan Ozgür, Grüter Martina, Giraud Anne-Lise, Kell Christian A, Grüter Thomas, Kleinschmidt Andreas, Kiebel Stefan J

机构信息

Wellcome Trust Centre for Neuroimaging, University College London, Queen Square, London WC1N 3BG, United Kingdom.

出版信息

Proc Natl Acad Sci U S A. 2008 May 6;105(18):6747-52. doi: 10.1073/pnas.0710826105. Epub 2008 Apr 24.

DOI:10.1073/pnas.0710826105

PMID:18436648

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2365564/

Abstract

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face.

摘要

人类面对面交流本质上是视听结合的。通常，人们与我们面对面交谈，同时提供听觉和视觉输入。当有视觉输入时，理解他人会更容易，因为诸如嘴巴和舌头运动等视觉线索会提供有关言语内容的补充信息。在此，我们假设，即使在没有视觉输入的情况下，大脑也会通过从不同的视觉面部处理区域收集特定于说话者的预测和约束来优化仅基于听觉的语音和说话者识别。为了验证这一假设，我们对两组进行了行为和神经成像实验：患有面部识别缺陷（面孔失认症）的受试者和匹配的对照组。结果表明，观察特定的人说话两分钟可改善随后对该人的仅基于听觉的语音和说话者识别。在面孔失认症患者和对照组中，仅基于听觉的语音识别方面的行为改善都基于一个通常参与面部运动处理的区域。说话者识别方面的改善仅出现在对照组中，且基于一个参与面部身份处理的区域。这些发现挑战了当前的单感官语音处理模型，因为它们表明，在仅基于听觉的语音中，大脑利用先前编码的视听相关性来优化交流。我们认为这种优化基于特定于说话者的视听内部模型，该模型用于模拟说话的面孔。

相似文献

Simulation of talking faces in the human brain improves auditory speech recognition.人类大脑中会说话面孔的模拟可提高听觉语音识别能力。

Proc Natl Acad Sci U S A. 2008 May 6;105(18):6747-52. doi: 10.1073/pnas.0710826105. Epub 2008 Apr 24.

Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.视觉面部运动敏感皮层与仅听觉的语音识别有关。

Cortex. 2015 Jul;68:86-99. doi: 10.1016/j.cortex.2014.11.016. Epub 2014 Dec 23.

Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.仅听觉言语感知过程中面部运动与言语可懂度区域之间的功能连接。

PLoS One. 2014 Jan 23;9(1):e86325. doi: 10.1371/journal.pone.0086325. eCollection 2014.

Does dynamic information about the speaker's face contribute to semantic speech processing? ERP evidence.说话人面部的动态信息是否有助于语义语音处理？ERP 证据。

Cortex. 2018 Jul;104:12-25. doi: 10.1016/j.cortex.2018.03.031. Epub 2018 Apr 9.

Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.视觉能力对仅通过听觉进行的语音识别很重要：来自自闭症谱系障碍的证据。

Neuropsychologia. 2014 Dec;65:1-11. doi: 10.1016/j.neuropsychologia.2014.09.031. Epub 2014 Oct 2.

Congruent audiovisual speech enhances auditory attention decoding with EEG.视听语音一致增强了 EEG 对听觉注意力的解码。

J Neural Eng. 2019 Nov 6;16(6):066033. doi: 10.1088/1741-2552/ab4340.

Listening to talking faces: motor cortical activation during speech perception.倾听会说话的面孔：言语感知过程中的运动皮层激活

Neuroimage. 2005 Mar;25(1):76-89. doi: 10.1016/j.neuroimage.2004.11.006. Epub 2005 Jan 8.

A Multisensory Perspective on Human Auditory Communication关于人类听觉交流的多感官视角

Inside Speech: Multisensory and Modality-specific Processing of Tongue and Lip Speech Actions.内心言语：舌头和嘴唇言语动作的多感官及特定模态处理

J Cogn Neurosci. 2017 Mar;29(3):448-466. doi: 10.1162/jocn_a_01057. Epub 2016 Oct 19.

The effect of varying talker identity and listening conditions on gaze behavior during audiovisual speech perception.不同说话者身份和聆听条件对视听言语感知过程中注视行为的影响。

Brain Res. 2008 Nov 25;1242:162-71. doi: 10.1016/j.brainres.2008.06.083. Epub 2008 Jun 28.

引用本文的文献

Face and voice identity matching accuracy is not improved by multimodal identity information.多模态身份信息并不能提高面部和语音身份匹配的准确性。

Br J Psychol. 2025 May;116(2):367-385. doi: 10.1111/bjop.12757. Epub 2024 Dec 17.

Prior multisensory learning can facilitate auditory-only voice-identity and speech recognition in noise.先前的多感官学习可以促进仅听觉模式下的语音身份识别以及噪声环境中的语音识别。

Q J Exp Psychol (Hove). 2024 Sep 20;78(7):17470218241278649. doi: 10.1177/17470218241278649.

A model for person perception from familiar and unfamiliar voices.一种基于熟悉和不熟悉声音的人物感知模型。

Commun Psychol. 2023;1(1):1. doi: 10.1038/s44271-023-00001-4. Epub 2023 Jul 25.

The Benefit of Bimodal Training in Voice Learning.双峰训练在语音学习中的益处。

Brain Sci. 2023 Aug 30;13(9):1260. doi: 10.3390/brainsci13091260.

Neural Correlates of Voice Learning with Distinctive and Non-Distinctive Faces.具有独特和非独特面孔的语音学习的神经关联

Brain Sci. 2023 Apr 7;13(4):637. doi: 10.3390/brainsci13040637.

Visual Deprivation Alters Functional Connectivity of Neural Networks for Voice Recognition: A Resting-State fMRI Study.视觉剥夺改变语音识别神经网络的功能连接：一项静息态功能磁共振成像研究。

Brain Sci. 2023 Apr 7;13(4):636. doi: 10.3390/brainsci13040636.

Neuroimage. 2023 Jul 1;274:120100. doi: 10.1016/j.neuroimage.2023.120100. Epub 2023 Apr 18.

The effects of the presence of a face and direct eye gaze on voice identity learning.面孔和直接眼神注视对面孔声音身份识别学习的影响。

Br J Psychol. 2023 Aug;114(3):537-549. doi: 10.1111/bjop.12633. Epub 2023 Jan 23.

Responses in left inferior frontal gyrus are altered for speech-in-noise processing, but not for clear speech in autism.自闭症患者在处理噪声中的言语时，左侧额下回的反应会发生改变，但在处理清晰言语时则不会。

Brain Behav. 2023 Feb;13(2):e2848. doi: 10.1002/brb3.2848. Epub 2022 Dec 27.

Representation of Expression and Identity by Ventral Prefrontal Neurons.腹侧前额叶神经元的表达和身份表征。

Neuroscience. 2022 Aug 1;496:243-260. doi: 10.1016/j.neuroscience.2022.05.033. Epub 2022 May 30.

本文引用的文献

Faces as objects of non-expertise: processing of thatcherised faces in congenital prosopagnosia.作为非专业对象的面孔：先天性面孔失认症中倒转面孔的加工

Perception. 2007;36(11):1635-45. doi: 10.1068/p5467.

Hearing facial identities.识别面部身份。

Q J Exp Psychol (Hove). 2007 Oct;60(10):1446-56. doi: 10.1080/17470210601063589.

Hereditary prosopagnosia: the first case series.遗传性面孔失认症：首个病例系列

Cortex. 2007 Aug;43(6):734-49. doi: 10.1016/s0010-9452(08)70502-1.

The proactive brain: using analogies and associations to generate predictions.主动式大脑：运用类比和联想来生成预测。

Trends Cogn Sci. 2007 Jul;11(7):280-9. doi: 10.1016/j.tics.2007.05.005. Epub 2007 Jun 4.

Optimal sensorimotor integration in recurrent cortical networks: a neural implementation of Kalman filters.循环皮质网络中的最佳感觉运动整合：卡尔曼滤波器的神经实现

J Neurosci. 2007 May 23;27(21):5744-56. doi: 10.1523/JNEUROSCI.3985-06.2007.

Shape conveyed by visual-to-auditory sensory substitution activates the lateral occipital complex.由视觉到听觉的感官替代所传达的形状激活了枕外侧复合体。

Nat Neurosci. 2007 Jun;10(6):687-9. doi: 10.1038/nn1912. Epub 2007 May 21.

Exploring the role of characteristic motion when learning new faces.探索学习新面孔时特征运动的作用。

Q J Exp Psychol (Hove). 2007 Apr;60(4):519-26. doi: 10.1080/17470210601117559.

The cortical organization of speech processing.言语处理的皮质组织。

Nat Rev Neurosci. 2007 May;8(5):393-402. doi: 10.1038/nrn2113. Epub 2007 Apr 13.

The fusiform face area: a cortical region specialized for the perception of faces.梭状回面孔区：一个专门用于面孔感知的皮质区域。

Philos Trans R Soc Lond B Biol Sci. 2006 Dec 29;361(1476):2109-28. doi: 10.1098/rstb.2006.1934.

Implicit multisensory associations influence voice recognition.内隐多感官关联影响语音识别。

PLoS Biol. 2006 Oct;4(10):e326. doi: 10.1371/journal.pbio.0040326.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验