• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在视听言语中,面部和声道构形之间的相互关系。

The interrelationship between the face and vocal tract configuration during audiovisual speech.

机构信息

Visual Neuroscience Group, School of Psychology, University of Nottingham, NG7 2RD Nottingham, United Kingdom;

Experimental Psychology, University College London, WC1H 0AP London, United Kingdom.

出版信息

Proc Natl Acad Sci U S A. 2020 Dec 22;117(51):32791-32798. doi: 10.1073/pnas.2006192117. Epub 2020 Dec 8.

DOI:10.1073/pnas.2006192117
PMID:33293422
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7768679/
Abstract

It is well established that speech perception is improved when we are able to see the speaker talking along with hearing their voice, especially when the speech is noisy. While we have a good understanding of where speech integration occurs in the brain, it is unclear how visual and auditory cues are combined to improve speech perception. One suggestion is that integration can occur as both visual and auditory cues arise from a common generator: the vocal tract. Here, we investigate whether facial and vocal tract movements are linked during speech production by comparing videos of the face and fast magnetic resonance (MR) image sequences of the vocal tract. The joint variation in the face and vocal tract was extracted using an application of principal components analysis (PCA), and we demonstrate that MR image sequences can be reconstructed with high fidelity using only the facial video and PCA. Reconstruction fidelity was significantly higher when images from the two sequences corresponded in time, and including implicit temporal information by combining contiguous frames also led to a significant increase in fidelity. A "Bubbles" technique was used to identify which areas of the face were important for recovering information about the vocal tract, and vice versa on a frame-by-frame basis. Our data reveal that there is sufficient information in the face to recover vocal tract shape during speech. In addition, the facial and vocal tract regions that are important for reconstruction are those that are used to generate the acoustic speech signal.

摘要

已经证实,当我们能够看到说话者说话的同时听到他们的声音时,尤其是在语音嘈杂的情况下,语音感知会得到改善。虽然我们很清楚语音整合发生在大脑的哪个部位,但不清楚视觉和听觉线索是如何结合起来提高语音感知的。一种说法是,整合可以发生在视觉和听觉线索都来自于一个共同的发生器:声道。在这里,我们通过比较面部视频和快速磁共振(MR)声道图像序列,研究在言语产生过程中面部和声道运动是否相关。使用主成分分析(PCA)的应用程序提取了面部和声道的联合变化,我们证明仅使用面部视频和 PCA 就可以非常准确地重建 MR 图像序列。当两个序列中的图像在时间上对应时,重建保真度显著提高,并且通过组合连续帧来包含隐含的时间信息也会导致保真度显著提高。使用“Bubbles”技术来识别在逐帧基础上对面部哪些区域对恢复声道信息重要,反之亦然。我们的数据表明,在说话过程中,面部有足够的信息来恢复声道形状。此外,对面部和声道进行重建的重要区域是那些用于生成声学语音信号的区域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/40b960e0016c/pnas.2006192117fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/a610115feb14/pnas.2006192117fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/450fee06f955/pnas.2006192117fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/047b2323f130/pnas.2006192117fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/1c42a597cbd5/pnas.2006192117fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/40b960e0016c/pnas.2006192117fig05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/a610115feb14/pnas.2006192117fig01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/450fee06f955/pnas.2006192117fig02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/047b2323f130/pnas.2006192117fig03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/1c42a597cbd5/pnas.2006192117fig04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9125/7768679/40b960e0016c/pnas.2006192117fig05.jpg

相似文献

1
The interrelationship between the face and vocal tract configuration during audiovisual speech.在视听言语中,面部和声道构形之间的相互关系。
Proc Natl Acad Sci U S A. 2020 Dec 22;117(51):32791-32798. doi: 10.1073/pnas.2006192117. Epub 2020 Dec 8.
2
Processing communicative facial and vocal cues in the superior temporal sulcus.处理上颞叶皮层中的交际性面部和声音线索。
Neuroimage. 2020 Nov 1;221:117191. doi: 10.1016/j.neuroimage.2020.117191. Epub 2020 Jul 23.
3
Mouth and Voice: A Relationship between Visual and Auditory Preference in the Human Superior Temporal Sulcus.嘴巴与声音:人类颞上沟中视觉与听觉偏好之间的关系。
J Neurosci. 2017 Mar 8;37(10):2697-2708. doi: 10.1523/JNEUROSCI.2914-16.2017. Epub 2017 Feb 8.
4
Can you McGurk yourself? Self-face and self-voice in audiovisual speech.你能自我麦格克效应吗?视听言语中的自我面孔和自我声音。
Psychon Bull Rev. 2012 Feb;19(1):66-72. doi: 10.3758/s13423-011-0176-8.
5
Listening to talking faces: motor cortical activation during speech perception.倾听会说话的面孔:言语感知过程中的运动皮层激活
Neuroimage. 2005 Mar;25(1):76-89. doi: 10.1016/j.neuroimage.2004.11.006. Epub 2005 Jan 8.
6
Individual differences in vocal size exaggeration.个体在音量夸张上的差异。
Sci Rep. 2022 Feb 16;12(1):2611. doi: 10.1038/s41598-022-05170-6.
7
Inter-speaker speech variability assessment using statistical deformable models from 3.0 tesla magnetic resonance images.使用来自3.0特斯拉磁共振图像的统计可变形模型进行说话者间语音变异性评估。
Proc Inst Mech Eng H. 2012 Mar;226(3):185-96. doi: 10.1177/0954411911431664.
8
Simulation of talking faces in the human brain improves auditory speech recognition.人类大脑中会说话面孔的模拟可提高听觉语音识别能力。
Proc Natl Acad Sci U S A. 2008 May 6;105(18):6747-52. doi: 10.1073/pnas.0710826105. Epub 2008 Apr 24.
9
Discrimination of speaker size from syllable phrases.从音节短语中辨别说话者的体型。
J Acoust Soc Am. 2005 Dec;118(6):3816-22. doi: 10.1121/1.2118427.
10
Some behavioral and neurobiological constraints on theories of audiovisual speech integration: a review and suggestions for new directions.视听言语整合理论的一些行为和神经生物学限制:综述与新方向建议
Seeing Perceiving. 2011;24(6):513-39. doi: 10.1163/187847611X595864. Epub 2011 Sep 29.

引用本文的文献

1
Prior multisensory learning can facilitate auditory-only voice-identity and speech recognition in noise.先前的多感官学习可以促进仅听觉模式下的语音身份识别以及噪声环境中的语音识别。
Q J Exp Psychol (Hove). 2024 Sep 20;78(7):17470218241278649. doi: 10.1177/17470218241278649.
2
Modulation transfer functions for audiovisual speech.视听语音的调制传递函数。
PLoS Comput Biol. 2022 Jul 19;18(7):e1010273. doi: 10.1371/journal.pcbi.1010273. eCollection 2022 Jul.
3
A PCA-Based Active Appearance Model for Characterising Modes of Spatiotemporal Variation in Dynamic Facial Behaviours.

本文引用的文献

1
Eye Movements During Visual Speech Perception in Deaf and Hearing Children.聋哑儿童和听力正常儿童在视觉言语感知过程中的眼球运动
Lang Learn. 2018 Jun;68(Suppl Suppl 1):159-179. doi: 10.1111/lang.12264. Epub 2017 Sep 26.
2
The hearing ear is always found close to the speaking tongue: Review of the role of the motor system in speech perception.听话的耳朵总是离说话的舌头很近:运动系统在言语感知中的作用综述。
Brain Lang. 2017 Jan;164:77-105. doi: 10.1016/j.bandl.2016.10.004. Epub 2016 Nov 5.
3
High visual resolution matters in audiovisual speech perception, but only for some.
一种基于主成分分析的主动外观模型,用于表征动态面部行为的时空变化模式。
Front Psychol. 2022 May 26;13:880548. doi: 10.3389/fpsyg.2022.880548. eCollection 2022.
4
Faces and Voices Processing in Human and Primate Brains: Rhythmic and Multimodal Mechanisms Underlying the Evolution and Development of Speech.人类和灵长类动物大脑中的面孔与声音处理:言语进化与发展背后的节律性和多模态机制
Front Psychol. 2022 Mar 30;13:829083. doi: 10.3389/fpsyg.2022.829083. eCollection 2022.
5
Neural indicators of articulator-specific sensorimotor influences on infant speech perception.神经指标显示,构音器官特异性的感觉运动影响婴儿言语感知。
Proc Natl Acad Sci U S A. 2021 May 18;118(20). doi: 10.1073/pnas.2025043118.
高视觉分辨率在视听言语感知中很重要,但仅对某些人而言。
Atten Percept Psychophys. 2016 Jul;78(5):1472-87. doi: 10.3758/s13414-016-1109-4.
4
Language familiarity modulates relative attention to the eyes and mouth of a talker.语言熟悉程度会调节对说话者眼睛和嘴巴的相对关注度。
Cognition. 2016 Feb;147:100-5. doi: 10.1016/j.cognition.2015.11.013. Epub 2015 Nov 30.
5
Prediction and constraint in audiovisual speech perception.视听言语感知中的预测与约束
Cortex. 2015 Jul;68:169-81. doi: 10.1016/j.cortex.2015.03.006. Epub 2015 Mar 20.
6
Identity From Variation: Representations of Faces Derived From Multiple Instances.从变化中识别身份:源自多个实例的面部表征
Cogn Sci. 2016 Jan;40(1):202-23. doi: 10.1111/cogs.12231. Epub 2015 Mar 30.
7
Neural pathways for visual speech perception.视觉言语感知的神经通路。
Front Neurosci. 2014 Dec 1;8:386. doi: 10.3389/fnins.2014.00386. eCollection 2014.
8
Hearing impairment and audiovisual speech integration ability: a case study report.听力障碍与视听言语整合能力:一项病例研究报告。
Front Psychol. 2014 Jul 1;5:678. doi: 10.3389/fpsyg.2014.00678. eCollection 2014.
9
Eigenfaces for recognition.特征脸识别。
J Cogn Neurosci. 1991 Winter;3(1):71-86. doi: 10.1162/jocn.1991.3.1.71.
10
Identifying regions that carry the best information about global facial configurations.识别携带有关全球面部构型最佳信息的区域。
J Vis. 2010 Sep 28;10(11):27. doi: 10.1167/10.11.27.