Suppr超能文献

唇读在中等噪声环境中对单词识别的帮助最大:基于高维特征空间的贝叶斯解释。

Lip-reading aids word recognition most in moderate noise: a Bayesian explanation using high-dimensional feature space.

作者信息

Ma Wei Ji, Zhou Xiang, Ross Lars A, Foxe John J, Parra Lucas C

机构信息

Department of Neuroscience, Baylor College of Medicine, Houston, Texas, United States of America.

出版信息

PLoS One. 2009;4(3):e4638. doi: 10.1371/journal.pone.0004638. Epub 2009 Mar 4.

Abstract

Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.

摘要

观察说话者的面部动作能够显著提高我们理解话语的能力,尤其是在嘈杂环境中。根据整合来自不同感官模态信息的一般原理(逆有效性原则),人们可能会认为视觉信号在听觉噪声最高时最为有效。然而,与最近一篇论文一致,我们发现视觉信息在中等听觉噪声水平下比在最高水平下更能提高表现,并且我们表明仅包含时间信息的新型视觉刺激也有同样效果。我们提出了一个最优线索整合的贝叶斯模型,该模型能够解释这些矛盾。在这个模型中,单词被视为多维空间中的点,单词识别是一个概率推理过程。当特征空间的维度较低时,贝叶斯模型预测逆有效性;当维度较高时,在中等听觉噪声水平下增强效果最大。当听觉和视觉刺激在高噪声中略有不同时,该模型做出了一个违反直觉的预测:随着声音质量的提高,对应视觉刺激的报告单词比例应先增加后减少。我们在一项行为实验中证实了这一预测。我们得出结论,视听语音感知遵循了之前仅在简单多感官刺激中观察到的最优性概念。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a11e/2645675/401b6b551b51/pone.0004638.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验