Suppr超能文献

心理生物学反应表明视听噪声对语音识别的挑战存在差异。

Psychobiological Responses Reveal Audiovisual Noise Differentially Challenges Speech Recognition.

作者信息

Bidelman Gavin M, Brown Bonnie, Mankel Kelsey, Nelms Price Caitlin

机构信息

School of Communication Sciences and Disorders, University of Memphis, Memphis, Tennessee, USA.

Institute for Intelligent Systems, University of Memphis, Memphis, Tennessee, USA.

出版信息

Ear Hear. 2020 Mar/Apr;41(2):268-277. doi: 10.1097/AUD.0000000000000755.

Abstract

OBJECTIVES

In noisy environments, listeners benefit from both hearing and seeing a talker, demonstrating audiovisual (AV) cues enhance speech-in-noise (SIN) recognition. Here, we examined the relative contribution of auditory and visual cues to SIN perception and the strategies used by listeners to decipher speech in noise interference(s).

DESIGN

Normal-hearing listeners (n = 22) performed an open-set speech recognition task while viewing audiovisual TIMIT sentences presented under different combinations of signal degradation including visual (AVn), audio (AnV), or multimodal (AnVn) noise. Acoustic and visual noises were matched in physical signal-to-noise ratio. Eyetracking monitored participants' gaze to different parts of a talker's face during SIN perception.

RESULTS

As expected, behavioral performance for clean sentence recognition was better for A-only and AV compared to V-only speech. Similarly, with noise in the auditory channel (AnV and AnVn speech), performance was aided by the addition of visual cues of the talker regardless of whether the visual channel contained noise, confirming a multimodal benefit to SIN recognition. The addition of visual noise (AVn) obscuring the talker's face had little effect on speech recognition by itself. Listeners' eye gaze fixations were biased toward the eyes (decreased at the mouth) whenever the auditory channel was compromised. Fixating on the eyes was negatively associated with SIN recognition performance. Eye gazes on the mouth versus eyes of the face also depended on the gender of the talker.

CONCLUSIONS

Collectively, results suggest listeners (1) depend heavily on the auditory over visual channel when seeing and hearing speech and (2) alter their visual strategy from viewing the mouth to viewing the eyes of a talker with signal degradations, which negatively affects speech perception.

摘要

目的

在嘈杂环境中,听众能从听闻和观看说话者中受益,这表明视听(AV)线索可增强噪声环境下言语(SIN)识别能力。在此,我们研究了听觉和视觉线索对SIN感知的相对贡献,以及听众在噪声干扰中解读言语所采用的策略。

设计

听力正常的听众(n = 22)在观看以不同信号退化组合呈现的视听TIMIT句子时,执行一项开放式言语识别任务,这些组合包括视觉(AVn)、听觉(AnV)或多模态(AnVn)噪声。声学噪声和视觉噪声在物理信噪比上进行匹配。在SIN感知过程中,眼动追踪监测参与者对说话者面部不同部位的注视情况。

结果

正如预期的那样,与仅视觉言语相比,仅听觉和视听条件下的纯净句子识别行为表现更好。同样,在听觉通道存在噪声时(AnV和AnVn言语),无论视觉通道是否包含噪声,添加说话者的视觉线索都有助于提高识别表现,这证实了多模态对SIN识别的益处。添加遮挡说话者面部的视觉噪声(AVn)本身对言语识别影响不大。只要听觉通道受到影响,听众的目光注视就会偏向眼睛(在嘴巴处减少)。注视眼睛与SIN识别表现呈负相关。对说话者面部嘴巴与眼睛的注视还取决于说话者的性别。

结论

总体而言,结果表明听众(1)在听闻言语时严重依赖听觉而非视觉通道,以及(2)随着信号退化,他们会将视觉策略从注视嘴巴转变为注视说话者的眼睛,这会对言语感知产生负面影响。

相似文献

1
Psychobiological Responses Reveal Audiovisual Noise Differentially Challenges Speech Recognition.
Ear Hear. 2020 Mar/Apr;41(2):268-277. doi: 10.1097/AUD.0000000000000755.
2
Gaze patterns and audiovisual speech enhancement.
J Speech Lang Hear Res. 2013 Apr;56(2):471-80. doi: 10.1044/1092-4388(2012/10-0288). Epub 2012 Dec 28.
3
Word Learning in Deaf Adults Who Use Cochlear Implants: The Role of Talker Variability and Attention to the Mouth.
Ear Hear. 2024;45(2):337-350. doi: 10.1097/AUD.0000000000001432. Epub 2023 Sep 11.
5
Audiovisual Enhancement of Speech Perception in Noise by School-Age Children Who Are Hard of Hearing.
Ear Hear. 2020 Jul/Aug;41(4):705-719. doi: 10.1097/AUD.0000000000000830.
6
Non-native listeners' recognition of high-variability speech using PRESTO.
J Am Acad Audiol. 2014 Oct;25(9):869-92. doi: 10.3766/jaaa.25.9.9.
8
Face Masks Impact Auditory and Audiovisual Consonant Recognition in Children With and Without Hearing Loss.
Front Psychol. 2022 May 13;13:874345. doi: 10.3389/fpsyg.2022.874345. eCollection 2022.
10
Eye Gaze and Perceptual Adaptation to Audiovisual Degraded Speech.
J Speech Lang Hear Res. 2021 Sep 14;64(9):3432-3445. doi: 10.1044/2021_JSLHR-21-00106. Epub 2021 Aug 31.

引用本文的文献

2
Autonomic Nervous System Correlates of Speech Categorization Revealed Through Pupillometry.
Front Neurosci. 2020 Jan 10;13:1418. doi: 10.3389/fnins.2019.01418. eCollection 2019.
3
Acoustic noise and vision differentially warp the auditory categorization of speech.
J Acoust Soc Am. 2019 Jul;146(1):60. doi: 10.1121/1.5114822.
4
Neural Correlates of Enhanced Audiovisual Processing in the Bilingual Brain.
Neuroscience. 2019 Mar 1;401:11-20. doi: 10.1016/j.neuroscience.2019.01.003. Epub 2019 Jan 9.

本文引用的文献

1
Neural Correlates of Enhanced Audiovisual Processing in the Bilingual Brain.
Neuroscience. 2019 Mar 1;401:11-20. doi: 10.1016/j.neuroscience.2019.01.003. Epub 2019 Jan 9.
2
The threshold for the McGurk effect in audio-visual noise decreases with development.
Sci Rep. 2018 Aug 17;8(1):12372. doi: 10.1038/s41598-018-30798-8.
4
The development of gaze to a speaking face.
J Acoust Soc Am. 2017 May;141(5):3145. doi: 10.1121/1.4982727.
5
Noise and pitch interact during the cortical segregation of concurrent speech.
Hear Res. 2017 Aug;351:34-44. doi: 10.1016/j.heares.2017.05.008. Epub 2017 May 25.
8
Beyond eye gaze: What else can eyetracking reveal about cognition and cognitive development?
Dev Cogn Neurosci. 2017 Jun;25:69-91. doi: 10.1016/j.dcn.2016.11.001. Epub 2016 Nov 11.
9
Face exploration dynamics differentiate men and women.
J Vis. 2016 Nov 1;16(14):16. doi: 10.1167/16.14.16.
10
Musicians have enhanced audiovisual multisensory binding: experience-dependent effects in the double-flash illusion.
Exp Brain Res. 2016 Oct;234(10):3037-47. doi: 10.1007/s00221-016-4705-6. Epub 2016 Jun 22.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验