Suppr超能文献

多说话人聆听环境下视觉语音包络对视听语音感知的影响。

Effects of Visual Speech Envelope on Audiovisual Speech Perception in Multitalker Listening Environments.

机构信息

Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville.

出版信息

J Speech Lang Hear Res. 2021 Jul 16;64(7):2845-2853. doi: 10.1044/2021_JSLHR-20-00688. Epub 2021 Jun 8.

Abstract

Purpose This study investigated the effects of visually presented speech envelope information with various modulation rates and depths on audiovisual speech perception in noise. Method Forty adults (21.25 ± 1.45 years) participated in audiovisual sentence recognition measurements in noise. Target speech sentences were auditorily presented in multitalker babble noises at a -3 dB SNR. Acoustic amplitude envelopes of target signals were extracted through low-pass filters with different cutoff frequencies (4, 10, and 30 Hz) and a fixed modulation depth at 100% (Experiment 1) or extracted with various modulation depths (0%, 25%, 50%, 75%, and 100%) and a fixed 10-Hz modulation rate (Experiment 2). The extracted target envelopes were synchronized with the amplitude of a spherical-shaped ball and presented as visual stimuli. Subjects were instructed to attend to both auditory and visual stimuli of the target sentences and type down their answers. The sentence recognition accuracy was compared between audio-only and audiovisual conditions. Results In Experiment 1, a significant improvement in speech intelligibility was observed when the visual analog (a sphere) synced with the acoustic amplitude envelope modulated at a 10-Hz modulation rate compared to the audio-only condition. In Experiment 2, the visual analog with 75% modulation depth resulted in better audiovisual speech perception in noise compared to the other modulation depth conditions. Conclusion An abstract visual analog of acoustic amplitude envelopes can be efficiently delivered by the visual system and integrated online with auditory signals to enhance speech perception in noise, independent of particular articulation movements.

摘要

目的 本研究旨在探讨具有不同调制率和深度的视觉呈现语音包络信息对噪声中视听语音感知的影响。

方法 40 名成年人(21.25±1.45 岁)参与了噪声中的视听句子识别测量。目标语音句子在多说话者背景噪声中以-3dB SNR 听觉呈现。通过具有不同截止频率(4、10 和 30Hz)的低通滤波器提取目标信号的声振幅包络,并保持 100%的固定调制深度(实验 1),或通过具有不同调制深度(0%、25%、50%、75%和 100%)和固定 10Hz 调制率提取目标包络(实验 2)。提取的目标包络与球形球的振幅同步,并作为视觉刺激呈现。要求受试者同时关注目标句子的听觉和视觉刺激,并输入答案。比较了仅音频和视听条件下的句子识别准确率。

结果 在实验 1 中,与仅音频条件相比,当视觉模拟(球体)与调制率为 10Hz 的声振幅包络同步时,语音可懂度显著提高。在实验 2 中,与其他调制深度条件相比,75%调制深度的视觉模拟在噪声中产生了更好的视听语音感知。

结论 可以通过视觉系统有效地传递声振幅包络的抽象视觉模拟,并在线与听觉信号集成,从而增强噪声中的语音感知,而不依赖于特定的发音运动。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验