语音感知中的跨模态声源识别

Crossmodal Source Identification in Speech Perception.

作者信息

Lachs Lorin, Pisoni David B

机构信息

Department of Psychology California State University, Fresno.

出版信息

Ecol Psychol. 2004;16(3):159-187. doi: 10.1207/s15326969eco1603_1.

DOI:10.1207/s15326969eco1603_1

PMID:21544262

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3085281/

Abstract

Four experiments examined the nature of multisensory speech information. In Experiment 1, participants were asked to match heard voices with dynamic visual-alone video clips of speakers' articulating faces. This cross-modal matching task was used to examine whether vocal source matching can be accomplished across sensory modalities. The results showed that observers could match speaking faces and voices, indicating that information about the speaker was available for cross-modal comparisons. In a series of follow-up experiments, several stimulus manipulations were used to determine some of the critical acoustic and optic patterns necessary for specifying cross-modal source information. The results showed that cross-modal source information was not available in static visual displays of faces and was not contingent on a prominent acoustic cue to vocal identity (f0). Furthermore, cross-modal matching was not possible when the acoustic signal was temporally reversed.

摘要

四项实验研究了多感官语音信息的本质。在实验1中，参与者被要求将听到的声音与说话者面部发音的动态纯视觉视频片段进行匹配。这个跨模态匹配任务用于检验语音源匹配是否可以在不同感官模态之间完成。结果表明，观察者能够匹配说话的面部和声音，这表明关于说话者的信息可用于跨模态比较。在一系列后续实验中，采用了几种刺激操作来确定指定跨模态源信息所需的一些关键声学和光学模式。结果表明，跨模态源信息在面部的静态视觉显示中不可用，并且不取决于对语音身份的突出声学线索（基频）。此外，当声学信号在时间上反转时，跨模态匹配是不可能的。

相似文献

Crossmodal Source Identification in Speech Perception.语音感知中的跨模态声源识别

Ecol Psychol. 2004;16(3):159-187. doi: 10.1207/s15326969eco1603_1.

Specification of cross-modal source information in isolated kinematic displays of speech.言语孤立运动学显示中跨模态源信息的规范

J Acoust Soc Am. 2004 Jul;116(1):507-18. doi: 10.1121/1.1757454.

Matching novel face and voice identity using static and dynamic facial images.使用静态和动态面部图像匹配新颖的面部和声音身份。

Atten Percept Psychophys. 2016 Apr;78(3):868-79. doi: 10.3758/s13414-015-1045-8.

Hearing a face: cross-modal speaker matching using isolated visible speech.

Percept Psychophys. 2006 Jan;68(1):84-93. doi: 10.3758/bf03193658.

Unimodal and cross-modal identity judgements using an audio-visual sorting task: Evidence for independent processing of faces and voices.采用视听排序任务进行单模态和跨模态身份判断：面部和声音独立处理的证据。

Mem Cognit. 2022 Jan;50(1):216-231. doi: 10.3758/s13421-021-01198-7. Epub 2021 Jul 12.

Cross-modal source information and spoken word recognition.跨模态源信息与口语单词识别。

J Exp Psychol Hum Percept Perform. 2004 Apr;30(2):378-96. doi: 10.1037/0096-1523.30.2.378.

Cross-modal signatures in maternal speech and singing.母婴语音和歌声的跨模态特征。

Front Psychol. 2013 Nov 1;4:811. doi: 10.3389/fpsyg.2013.00811. eCollection 2013.

Voice recognition and cross-modal responses to familiar speakers' voices in prosopagnosia.面孔失认症中对熟悉说话者声音的语音识别和跨模态反应。

Cereb Cortex. 2006 Sep;16(9):1314-22. doi: 10.1093/cercor/bhj073. Epub 2005 Nov 9.

Voice aftereffects of adaptation to speaker identity.对说话人身份适应的语音后效。

Hear Res. 2010 Sep 1;268(1-2):38-45. doi: 10.1016/j.heares.2010.04.011. Epub 2010 Apr 27.

Explaining face-voice matching decisions: The contribution of mouth movements, stimulus effects and response biases.解释面部-语音匹配决策：口部运动、刺激效应和反应偏差的贡献。

Atten Percept Psychophys. 2021 Jul;83(5):2205-2216. doi: 10.3758/s13414-021-02290-5. Epub 2021 Apr 1.

引用本文的文献

Something in the way they move: characteristics of identity present in faces, voices, body movements, and actions.他们的行动方式：面部、声音、身体动作和行为中所呈现的身份特征。

Front Psychol. 2025 Jul 15;16:1645218. doi: 10.3389/fpsyg.2025.1645218. eCollection 2025.

Face and voice identity matching accuracy is not improved by multimodal identity information.多模态身份信息并不能提高面部和语音身份匹配的准确性。

Br J Psychol. 2025 May;116(2):367-385. doi: 10.1111/bjop.12757. Epub 2024 Dec 17.

Atten Percept Psychophys. 2021 Jul;83(5):2205-2216. doi: 10.3758/s13414-021-02290-5. Epub 2021 Apr 1.

Face-voice space: Integrating visual and auditory cues in judgments of person distinctiveness.面部-声音空间：在人物独特性判断中整合视觉和听觉线索。

Atten Percept Psychophys. 2020 Oct;82(7):3710-3727. doi: 10.3758/s13414-020-02084-1.

Matching Unfamiliar Voices to Static and Dynamic Faces: No Evidence for a Dynamic Face Advantage in a Simultaneous Presentation Paradigm.将陌生声音与静态和动态面孔进行匹配：在同步呈现范式中没有证据表明动态面孔具有优势。

Front Psychol. 2019 Aug 23;10:1957. doi: 10.3389/fpsyg.2019.01957. eCollection 2019.

The Stolen Voice Illusion.被盗声音错觉

Perception. 2019 Aug;48(8):649-667. doi: 10.1177/0301006619858076. Epub 2019 Jul 2.

Developmental Shifts in Detection and Attention for Auditory, Visual, and Audiovisual Speech.听觉、视觉和视听言语检测与注意的发展转变。

J Speech Lang Hear Res. 2018 Dec 10;61(12):3095-3112. doi: 10.1044/2018_JSLHR-H-17-0343.

Voice over: Audio-visual congruency and content recall in the gallery setting.画外音：画廊环境中的视听一致性与内容回忆。

PLoS One. 2017 Jun 21;12(6):e0177622. doi: 10.1371/journal.pone.0177622. eCollection 2017.

The development of gaze to a speaking face.对说话面孔的注视发展。

J Acoust Soc Am. 2017 May;141(5):3145. doi: 10.1121/1.4982727.

Matching novel face and voice identity using static and dynamic facial images.使用静态和动态面部图像匹配新颖的面部和声音身份。

Atten Percept Psychophys. 2016 Apr;78(3):868-79. doi: 10.3758/s13414-015-1045-8.

本文引用的文献

Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics.正常语音的可懂度I：整体和细粒度的声学语音说话者特征。

Speech Commun. 1996 Dec 1;20(3):255-272. doi: 10.1016/S0167-6393(96)00063-5.

Hearing a face: cross-modal speaker matching using isolated visible speech.

Percept Psychophys. 2006 Jan;68(1):84-93. doi: 10.3758/bf03193658.

Specification of cross-modal source information in isolated kinematic displays of speech.言语孤立运动学显示中跨模态源信息的规范

J Acoust Soc Am. 2004 Jul;116(1):507-18. doi: 10.1121/1.1757454.

Cross-modal source information and spoken word recognition.跨模态源信息与口语单词识别。

J Exp Psychol Hum Percept Perform. 2004 Apr;30(2):378-96. doi: 10.1037/0096-1523.30.2.378.

"Putting the face to the voice": matching identity across modality.“将面孔与声音匹配”：跨模态识别身份

Curr Biol. 2003 Sep 30;13(19):1709-14. doi: 10.1016/j.cub.2003.09.005.

Chimaeric sounds reveal dichotomies in auditory perception.嵌合音揭示了听觉感知中的二分法。

Nature. 2002 Mar 7;416(6876):87-90. doi: 10.1038/416087a.

The effect of speechreading on masked detection thresholds for filtered speech.唇读对滤波语音掩蔽检测阈值的影响。

J Acoust Soc Am. 2001 May;109(5 Pt 1):2272-5. doi: 10.1121/1.1362687.

Visual and audiovisual speech perception with color and gray-scale facial images.利用彩色和灰度面部图像进行视觉和视听语音感知。

Percept Psychophys. 2000 Oct;62(7):1394-404. doi: 10.3758/bf03212141.

The use of visible speech cues for improving auditory detection of spoken sentences.使用可见语音线索来提高对口语句子的听觉检测。

J Acoust Soc Am. 2000 Sep;108(3 Pt 1):1197-208. doi: 10.1121/1.1288668.

Speech perception without hearing.无听力情况下的语音感知。

Percept Psychophys. 2000 Feb;62(2):233-52. doi: 10.3758/bf03205546.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验