Bidelman Gavin M, Brown Bonnie, Mankel Kelsey, Nelms Price Caitlin
School of Communication Sciences and Disorders, University of Memphis, Memphis, Tennessee, USA.
Institute for Intelligent Systems, University of Memphis, Memphis, Tennessee, USA.
Ear Hear. 2020 Mar/Apr;41(2):268-277. doi: 10.1097/AUD.0000000000000755.
In noisy environments, listeners benefit from both hearing and seeing a talker, demonstrating audiovisual (AV) cues enhance speech-in-noise (SIN) recognition. Here, we examined the relative contribution of auditory and visual cues to SIN perception and the strategies used by listeners to decipher speech in noise interference(s).
Normal-hearing listeners (n = 22) performed an open-set speech recognition task while viewing audiovisual TIMIT sentences presented under different combinations of signal degradation including visual (AVn), audio (AnV), or multimodal (AnVn) noise. Acoustic and visual noises were matched in physical signal-to-noise ratio. Eyetracking monitored participants' gaze to different parts of a talker's face during SIN perception.
As expected, behavioral performance for clean sentence recognition was better for A-only and AV compared to V-only speech. Similarly, with noise in the auditory channel (AnV and AnVn speech), performance was aided by the addition of visual cues of the talker regardless of whether the visual channel contained noise, confirming a multimodal benefit to SIN recognition. The addition of visual noise (AVn) obscuring the talker's face had little effect on speech recognition by itself. Listeners' eye gaze fixations were biased toward the eyes (decreased at the mouth) whenever the auditory channel was compromised. Fixating on the eyes was negatively associated with SIN recognition performance. Eye gazes on the mouth versus eyes of the face also depended on the gender of the talker.
Collectively, results suggest listeners (1) depend heavily on the auditory over visual channel when seeing and hearing speech and (2) alter their visual strategy from viewing the mouth to viewing the eyes of a talker with signal degradations, which negatively affects speech perception.
在嘈杂环境中,听众能从听闻和观看说话者中受益,这表明视听(AV)线索可增强噪声环境下言语(SIN)识别能力。在此,我们研究了听觉和视觉线索对SIN感知的相对贡献,以及听众在噪声干扰中解读言语所采用的策略。
听力正常的听众(n = 22)在观看以不同信号退化组合呈现的视听TIMIT句子时,执行一项开放式言语识别任务,这些组合包括视觉(AVn)、听觉(AnV)或多模态(AnVn)噪声。声学噪声和视觉噪声在物理信噪比上进行匹配。在SIN感知过程中,眼动追踪监测参与者对说话者面部不同部位的注视情况。
正如预期的那样,与仅视觉言语相比,仅听觉和视听条件下的纯净句子识别行为表现更好。同样,在听觉通道存在噪声时(AnV和AnVn言语),无论视觉通道是否包含噪声,添加说话者的视觉线索都有助于提高识别表现,这证实了多模态对SIN识别的益处。添加遮挡说话者面部的视觉噪声(AVn)本身对言语识别影响不大。只要听觉通道受到影响,听众的目光注视就会偏向眼睛(在嘴巴处减少)。注视眼睛与SIN识别表现呈负相关。对说话者面部嘴巴与眼睛的注视还取决于说话者的性别。
总体而言,结果表明听众(1)在听闻言语时严重依赖听觉而非视觉通道,以及(2)随着信号退化,他们会将视觉策略从注视嘴巴转变为注视说话者的眼睛,这会对言语感知产生负面影响。