Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA.
Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA.
J Acoust Soc Am. 2021 Sep;150(3):2230. doi: 10.1121/10.0006385.
A fundamental question in the neuroscience of everyday communication is how scene acoustics shape the neural processing of attended speech sounds and in turn impact speech intelligibility. While it is well known that the temporal envelopes in target speech are important for intelligibility, how the neural encoding of target-speech envelopes is influenced by background sounds or other acoustic features of the scene is unknown. Here, we combine human electroencephalography with simultaneous intelligibility measurements to address this key gap. We find that the neural envelope-domain signal-to-noise ratio in target-speech encoding, which is shaped by masker modulations, predicts intelligibility over a range of strategically chosen realistic listening conditions unseen by the predictive model. This provides neurophysiological evidence for modulation masking. Moreover, using high-resolution vocoding to carefully control peripheral envelopes, we show that target-envelope coding fidelity in the brain depends not only on envelopes conveyed by the cochlea, but also on the temporal fine structure (TFS), which supports scene segregation. Our results are consistent with the notion that temporal coherence of sound elements across envelopes and/or TFS influences scene analysis and attentive selection of a target sound. Our findings also inform speech-intelligibility models and technologies attempting to improve real-world speech communication.
日常交流中的神经科学的一个基本问题是场景声学如何影响被注意的语音声音的神经处理,进而影响语音可懂度。虽然众所周知,目标语音中的时间包络对于可懂度很重要,但背景声音或场景的其他声学特征如何影响目标语音包络的神经编码尚不清楚。在这里,我们结合人类脑电图和同时的可懂度测量来解决这个关键的差距。我们发现,由掩蔽调制形成的目标语音编码中的神经包络域信噪比,可以预测在一系列策略性选择的、未被预测模型看到的现实听力条件下的可懂度。这为调制掩蔽提供了神经生理学证据。此外,我们使用高分辨率声码器来仔细控制外围包络,表明大脑中的目标包络编码保真度不仅取决于耳蜗传递的包络,还取决于支持场景分离的时间精细结构(TFS)。我们的结果与以下观点一致,即声音元素在包络和/或 TFS 中的时间相干性影响场景分析和对目标声音的注意力选择。我们的发现也为试图改善现实世界语音通信的语音可懂度模型和技术提供了信息。