跨模态源信息与口语单词识别。

Cross-modal source information and spoken word recognition.

作者信息

Lachs Lorin, Pisoni David B

机构信息

Department of Psychology, California State University, Fresno, CA, USA.

出版信息

J Exp Psychol Hum Percept Perform. 2004 Apr;30(2):378-96. doi: 10.1037/0096-1523.30.2.378.

DOI:10.1037/0096-1523.30.2.378

PMID:15053696

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3432944/

Abstract

In a cross-modal matching task, participants were asked to match visual and auditory displays of speech based on the identity of the speaker. The present investigation used this task with acoustically transformed speech to examine the properties of sound that can convey cross-modal information. Word recognition performance was also measured under the same transformations. The authors found that cross-modal matching was only possible under transformations that preserved the relative spectral and temporal patterns of formant frequencies. In addition, cross-modal matching was only possible under the same conditions that yielded robust word recognition performance. The results are consistent with the hypothesis that acoustic and optical displays of speech simultaneously carry articulatory information about both the underlying linguistic message and indexical properties of the talker.

摘要

在一个跨模态匹配任务中，参与者被要求根据说话者的身份匹配语音的视觉和听觉展示。本研究使用这个任务以及经过声学转换的语音来检验能够传达跨模态信息的声音特性。在相同的转换条件下还测量了单词识别表现。作者发现，只有在保留共振峰频率的相对频谱和时间模式的转换条件下，跨模态匹配才有可能实现。此外，只有在产生稳健单词识别表现的相同条件下，跨模态匹配才有可能实现。这些结果与以下假设一致，即语音的声学和光学展示同时携带关于潜在语言信息和说话者索引属性的发音信息。

相似文献

Cross-modal source information and spoken word recognition.

J Exp Psychol Hum Percept Perform. 2004 Apr;30(2):378-96. doi: 10.1037/0096-1523.30.2.378.

Specification of cross-modal source information in isolated kinematic displays of speech.

J Acoust Soc Am. 2004 Jul;116(1):507-18. doi: 10.1121/1.1757454.

Examining the time course of indexical specificity effects in spoken word recognition.

J Exp Psychol Learn Mem Cogn. 2005 Mar;31(2):306-21. doi: 10.1037/0278-7393.31.2.306.

Hemispheric differences in indexical specificity effects in spoken word recognition.

J Exp Psychol Hum Percept Perform. 2007 Apr;33(2):410-24. doi: 10.1037/0096-1523.33.2.410.

Lip-read me now, hear me better later: cross-modal transfer of talker-familiarity effects.

Psychol Sci. 2007 May;18(5):392-6. doi: 10.1111/j.1467-9280.2007.01911.x.

On building models of spoken-word recognition: when there is as much to learn from natural "oddities" as artificial normality.

Percept Psychophys. 2008 Oct;70(7):1235-42. doi: 10.3758/PP.70.7.1235.

Does knowing speaker sex facilitate vowel recognition at short durations?

Acta Psychol (Amst). 2014 May;148:81-90. doi: 10.1016/j.actpsy.2014.01.010. Epub 2014 Feb 1.

Suprasegmental lexical stress cues in visual speech can guide spoken-word recognition.

Q J Exp Psychol (Hove). 2014;67(4):793-808. doi: 10.1080/17470218.2013.834371. Epub 2013 Oct 18.

Intelligibility of emotional speech in younger and older adults.

Ear Hear. 2014 Nov-Dec;35(6):695-707. doi: 10.1097/AUD.0000000000000082.

Effect of training on word-recognition performance in noise for young normal-hearing and older hearing-impaired listeners.

Ear Hear. 2006 Jun;27(3):263-78. doi: 10.1097/01.aud.0000215980.21158.a2.

引用本文的文献

Something in the way they move: characteristics of identity present in faces, voices, body movements, and actions.

Front Psychol. 2025 Jul 15;16:1645218. doi: 10.3389/fpsyg.2025.1645218. eCollection 2025.

Understanding discourse in face-to-face settings: The impact of multimodal cues and listening conditions.

J Exp Psychol Learn Mem Cogn. 2025 May;51(5):837-854. doi: 10.1037/xlm0001399. Epub 2024 Oct 14.

Prior multisensory learning can facilitate auditory-only voice-identity and speech recognition in noise.

Q J Exp Psychol (Hove). 2024 Sep 20;78(7):17470218241278649. doi: 10.1177/17470218241278649.

The Benefit of Bimodal Training in Voice Learning.

Brain Sci. 2023 Aug 30;13(9):1260. doi: 10.3390/brainsci13091260.

The role of iconic gestures and mouth movements in face-to-face communication.

Psychon Bull Rev. 2022 Apr;29(2):600-612. doi: 10.3758/s13423-021-02009-5. Epub 2021 Oct 20.

Visual mechanisms for voice-identity recognition flexibly adjust to auditory noise level.

Hum Brain Mapp. 2021 Aug 15;42(12):3963-3982. doi: 10.1002/hbm.25532. Epub 2021 May 27.

Cross-modal transfer of talker-identity learning.

Atten Percept Psychophys. 2021 Jan;83(1):415-434. doi: 10.3758/s13414-020-02141-9. Epub 2020 Oct 20.

Bringing the Nonlinearity of the Movement System to Gestural Theories of Language Use: Multifractal Structure of Spoken English Supports the Compensation for Coarticulation in Human Speech Perception.

Front Physiol. 2018 Sep 3;9:1152. doi: 10.3389/fphys.2018.01152. eCollection 2018.

Experience with a talker can transfer across modalities to facilitate lipreading.

Atten Percept Psychophys. 2013 Oct;75(7):1359-65. doi: 10.3758/s13414-013-0534-x.

Visual influences on interactive speech alignment.

Perception. 2011;40(12):1457-66. doi: 10.1068/p7071.

本文引用的文献

Crossmodal Source Identification in Speech Perception.

Ecol Psychol. 2004;16(3):159-187. doi: 10.1207/s15326969eco1603_1.

Long-term memory in speech perception: Some new findings on talker variability, speaking rate and perceptual learning.

Speech Commun. 1993 Oct;13(1-2):109-125. doi: 10.1016/0167-6393(93)90063-q.

Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics.

Speech Commun. 1996 Dec 1;20(3):255-272. doi: 10.1016/S0167-6393(96)00063-5.

Visual speech information for face recognition.

Percept Psychophys. 2002 Feb;64(2):220-9. doi: 10.3758/bf03195788.

Chimaeric sounds reveal dichotomies in auditory perception.

Nature. 2002 Mar 7;416(6876):87-90. doi: 10.1038/416087a.

The effect of speechreading on masked detection thresholds for filtered speech.

J Acoust Soc Am. 2001 May;109(5 Pt 1):2272-5. doi: 10.1121/1.1362687.

Perceptual "vowel spaces" of cochlear implant users: implications for the study of auditory adaptation to spectral shift.

J Acoust Soc Am. 2001 May;109(5 Pt 1):2135-45. doi: 10.1121/1.1350403.

A model of facial biomechanics for speech production.

J Acoust Soc Am. 1999 Nov;106(5):2834-42. doi: 10.1121/1.428108.

Echoes of echoes? An episodic theory of lexical access.

Psychol Rev. 1998 Apr;105(2):251-79. doi: 10.1037/0033-295x.105.2.251.

Lip and jaw kinematics in bilabial stop consonant production.

J Speech Lang Hear Res. 1997 Aug;40(4):877-93. doi: 10.1044/jslhr.4004.877.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

跨模态源信息与口语单词识别。

Cross-modal source information and spoken word recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献