视听语音感知中的说话人变异性。

Talker variability in audio-visual speech perception.

机构信息

Department of Psychology, The University of Chicago Chicago, IL, USA.

出版信息

Front Psychol. 2014 Jul 16;5:698. doi: 10.3389/fpsyg.2014.00698. eCollection 2014.

DOI:10.3389/fpsyg.2014.00698

PMID:25076919

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4100456/

Abstract

A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.

摘要

说话人变化是语音声学模式的语音解释的上下文变化。不同的说话人在声学模式和语音类别之间有不同的映射，而听众需要适应这些差异。尽管存在这种复杂性，但听众能够在多说话人环境中熟练地理解言语，尽管存在轻微但可测量的性能成本（例如，识别速度较慢）。到目前为止，这种说话人可变性成本仅在仅音频的语音中得到证明。然而，在单说话人环境中的其他研究表明，当听众能够看到说话人的脸时，在增加声学模式和语音类别之间映射不确定性的不利聆听（例如，噪声或失真）条件下，语音识别会得到改善。看到说话人的脸是否会降低多说话人环境中单词识别的成本？我们使用了一个快速单词监测任务，在该任务中，听众对单说话人和多说话人环境中的目标单词识别做出快速判断。结果表明，与多说话人环境相比，仅音频和视听语音的单说话人环境的识别性能更快。然而，在多说话人环境中，视听条件下的识别时间比仅音频条件下慢。这些结果表明，在语音感知期间看到说话人的脸可能会通过增加说话人识别的重要性来减慢识别速度，向听众发出说话人发生变化的信号。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e6e/4100456/b08e22390377/fpsyg-05-00698-g001.jpg

相似文献

Talker variability in audio-visual speech perception.

Front Psychol. 2014 Jul 16;5:698. doi: 10.3389/fpsyg.2014.00698. eCollection 2014.

Talker familiarity and the accommodation of talker variability.

Atten Percept Psychophys. 2021 May;83(4):1842-1860. doi: 10.3758/s13414-020-02203-y. Epub 2021 Jan 4.

Listener sensitivity to individual talker differences in voice-onset-time.

J Acoust Soc Am. 2004 Jun;115(6):3171-83. doi: 10.1121/1.1701898.

Word Learning in Deaf Adults Who Use Cochlear Implants: The Role of Talker Variability and Attention to the Mouth.

Ear Hear. 2024;45(2):337-350. doi: 10.1097/AUD.0000000000001432. Epub 2023 Sep 11.

Speech perception in children with cochlear implants: effects of lexical difficulty, talker variability, and word length.

Ann Otol Rhinol Laryngol Suppl. 2000 Dec;185:79-81. doi: 10.1177/0003489400109s1234.

Psychobiological Responses Reveal Audiovisual Noise Differentially Challenges Speech Recognition.

Ear Hear. 2020 Mar/Apr;41(2):268-277. doi: 10.1097/AUD.0000000000000755.

Multiple sources of acoustic variation affect speech processing efficiency.

J Acoust Soc Am. 2023 Jan;153(1):209. doi: 10.1121/10.0016611.

The advantage of knowing the talker.

J Am Acad Audiol. 2013 Sep;24(8):689-700. doi: 10.3766/jaaa.24.8.6.

Listening Effort by Native and Nonnative Listeners Due to Noise, Reverberation, and Talker Foreign Accent During English Speech Perception.

J Speech Lang Hear Res. 2019 Apr 15;62(4):1068-1081. doi: 10.1044/2018_JSLHR-H-17-0423.

Tuned with a Tune: Talker Normalization via General Auditory Processes.

Front Psychol. 2012 Jun 22;3:203. doi: 10.3389/fpsyg.2012.00203. eCollection 2012.

引用本文的文献

Learning to recognize unfamiliar faces from fine-phonetic detail in visual speech.

Atten Percept Psychophys. 2025 Apr;87(3):936-951. doi: 10.3758/s13414-025-03049-y. Epub 2025 Mar 20.

Multiple talker processing in autistic adult listeners.

Sci Rep. 2024 Jun 26;14(1):14698. doi: 10.1038/s41598-024-62429-w.

Sequence effects and speech processing: cognitive load for speaker-switching within and across accents.

Psychon Bull Rev. 2024 Feb;31(1):176-186. doi: 10.3758/s13423-023-02322-1. Epub 2023 Jul 13.

Multiple sources of acoustic variation affect speech processing efficiency.

J Acoust Soc Am. 2023 Jan;153(1):209. doi: 10.1121/10.0016611.

Attention, task demands, and multitalker processing costs in speech perception.

J Exp Psychol Hum Percept Perform. 2021 Dec;47(12):1673-1680. doi: 10.1037/xhp0000963.

Cortical mechanisms of talker normalization in fluent sentences.

Brain Lang. 2020 Feb;201:104722. doi: 10.1016/j.bandl.2019.104722. Epub 2019 Dec 10.

Limits of Perceived Audio-Visual Spatial Coherence as Defined by Reaction Time Measurements.

Front Neurosci. 2019 May 22;13:451. doi: 10.3389/fnins.2019.00451. eCollection 2019.

Understanding environmental sounds in sentence context.

Cognition. 2018 Mar;172:134-143. doi: 10.1016/j.cognition.2017.12.009. Epub 2017 Dec 19.

Effects of Looking Behavior on Listening and Understanding in a Simulated Classroom.

J Educ Audiol. 2014 Jan 1;20:24-33.

Multisensory and sensorimotor interactions in speech perception.

Front Psychol. 2015 Apr 20;6:458. doi: 10.3389/fpsyg.2015.00458. eCollection 2015.

本文引用的文献

Listening for the norm: adaptive coding in speech categorization.

Front Psychol. 2012 Feb 1;3:10. doi: 10.3389/fpsyg.2012.00010. eCollection 2012.

The direct and indirect roles of fundamental frequency in vowel perception.

J Acoust Soc Am. 2012 Jan;131(1):466-77. doi: 10.1121/1.3662068.

When less is heard than meets the ear: change deafness in a telephone conversation.

Q J Exp Psychol (Hove). 2011 Jul;64(7):1442-56. doi: 10.1080/17470218.2011.570353.

SPEECH PERCEPTION AS A TALKER-CONTINGENT PROCESS.

Psychol Sci. 1994 Jan 1;5(1):42-46. doi: 10.1111/j.1467-9280.1994.tb00612.x.

Long-term memory in speech perception: Some new findings on talker variability, speaking rate and perceptual learning.

Speech Commun. 1993 Oct;13(1-2):109-125. doi: 10.1016/0167-6393(93)90063-q.

Neural signatures of phonetic learning in adulthood: a magnetoencephalography study.

Neuroimage. 2009 May 15;46(1):226-40. doi: 10.1016/j.neuroimage.2009.01.028. Epub 2009 Jan 29.

Abstract coding of audiovisual speech: beyond sensory representation.

Neuron. 2007 Dec 20;56(6):1116-26. doi: 10.1016/j.neuron.2007.09.037.

Acoustic differences, listener expectations, and the perceptual accommodation of talker variability.

J Exp Psychol Hum Percept Perform. 2007 Apr;33(2):391-409. doi: 10.1037/0096-1523.33.2.391.

Hearing lips and seeing voices: how cortical areas supporting speech production mediate audiovisual speech perception.

Cereb Cortex. 2007 Oct;17(10):2387-99. doi: 10.1093/cercor/bhl147. Epub 2007 Jan 11.

Repetition and the brain: neural models of stimulus-specific effects.

Trends Cogn Sci. 2006 Jan;10(1):14-23. doi: 10.1016/j.tics.2005.11.006.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

视听语音感知中的说话人变异性。

Talker variability in audio-visual speech perception.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献