Suppr超能文献

调制掩蔽和精细结构形状神经包络编码可预测各种听力条件下的言语可懂度。

Modulation masking and fine structure shape neural envelope coding to predict speech intelligibility across diverse listening conditions.

机构信息

Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA.

Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA.

出版信息

J Acoust Soc Am. 2021 Sep;150(3):2230. doi: 10.1121/10.0006385.

Abstract

A fundamental question in the neuroscience of everyday communication is how scene acoustics shape the neural processing of attended speech sounds and in turn impact speech intelligibility. While it is well known that the temporal envelopes in target speech are important for intelligibility, how the neural encoding of target-speech envelopes is influenced by background sounds or other acoustic features of the scene is unknown. Here, we combine human electroencephalography with simultaneous intelligibility measurements to address this key gap. We find that the neural envelope-domain signal-to-noise ratio in target-speech encoding, which is shaped by masker modulations, predicts intelligibility over a range of strategically chosen realistic listening conditions unseen by the predictive model. This provides neurophysiological evidence for modulation masking. Moreover, using high-resolution vocoding to carefully control peripheral envelopes, we show that target-envelope coding fidelity in the brain depends not only on envelopes conveyed by the cochlea, but also on the temporal fine structure (TFS), which supports scene segregation. Our results are consistent with the notion that temporal coherence of sound elements across envelopes and/or TFS influences scene analysis and attentive selection of a target sound. Our findings also inform speech-intelligibility models and technologies attempting to improve real-world speech communication.

摘要

日常交流中的神经科学的一个基本问题是场景声学如何影响被注意的语音声音的神经处理,进而影响语音可懂度。虽然众所周知,目标语音中的时间包络对于可懂度很重要,但背景声音或场景的其他声学特征如何影响目标语音包络的神经编码尚不清楚。在这里,我们结合人类脑电图和同时的可懂度测量来解决这个关键的差距。我们发现,由掩蔽调制形成的目标语音编码中的神经包络域信噪比,可以预测在一系列策略性选择的、未被预测模型看到的现实听力条件下的可懂度。这为调制掩蔽提供了神经生理学证据。此外,我们使用高分辨率声码器来仔细控制外围包络,表明大脑中的目标包络编码保真度不仅取决于耳蜗传递的包络,还取决于支持场景分离的时间精细结构(TFS)。我们的结果与以下观点一致,即声音元素在包络和/或 TFS 中的时间相干性影响场景分析和对目标声音的注意力选择。我们的发现也为试图改善现实世界语音通信的语音可懂度模型和技术提供了信息。

相似文献

引用本文的文献

本文引用的文献

4
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
8
Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope.基于语音包络神经同步预测语音可懂度。
J Assoc Res Otolaryngol. 2018 Apr;19(2):181-191. doi: 10.1007/s10162-018-0654-z. Epub 2018 Feb 20.
10
Causal cortical dynamics of a predictive enhancement of speech intelligibility.因果皮层动态预测增强言语可懂度。
Neuroimage. 2018 Feb 1;166:247-258. doi: 10.1016/j.neuroimage.2017.10.066. Epub 2017 Nov 2.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验