调制掩蔽和精细结构形状神经包络编码可预测各种听力条件下的言语可懂度。

Modulation masking and fine structure shape neural envelope coding to predict speech intelligibility across diverse listening conditions.

机构信息

Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA.

Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA.

出版信息

J Acoust Soc Am. 2021 Sep;150(3):2230. doi: 10.1121/10.0006385.

DOI:10.1121/10.0006385

PMID:34598642

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8483789/

Abstract

A fundamental question in the neuroscience of everyday communication is how scene acoustics shape the neural processing of attended speech sounds and in turn impact speech intelligibility. While it is well known that the temporal envelopes in target speech are important for intelligibility, how the neural encoding of target-speech envelopes is influenced by background sounds or other acoustic features of the scene is unknown. Here, we combine human electroencephalography with simultaneous intelligibility measurements to address this key gap. We find that the neural envelope-domain signal-to-noise ratio in target-speech encoding, which is shaped by masker modulations, predicts intelligibility over a range of strategically chosen realistic listening conditions unseen by the predictive model. This provides neurophysiological evidence for modulation masking. Moreover, using high-resolution vocoding to carefully control peripheral envelopes, we show that target-envelope coding fidelity in the brain depends not only on envelopes conveyed by the cochlea, but also on the temporal fine structure (TFS), which supports scene segregation. Our results are consistent with the notion that temporal coherence of sound elements across envelopes and/or TFS influences scene analysis and attentive selection of a target sound. Our findings also inform speech-intelligibility models and technologies attempting to improve real-world speech communication.

摘要

日常交流中的神经科学的一个基本问题是场景声学如何影响被注意的语音声音的神经处理，进而影响语音可懂度。虽然众所周知，目标语音中的时间包络对于可懂度很重要，但背景声音或场景的其他声学特征如何影响目标语音包络的神经编码尚不清楚。在这里，我们结合人类脑电图和同时的可懂度测量来解决这个关键的差距。我们发现，由掩蔽调制形成的目标语音编码中的神经包络域信噪比，可以预测在一系列策略性选择的、未被预测模型看到的现实听力条件下的可懂度。这为调制掩蔽提供了神经生理学证据。此外，我们使用高分辨率声码器来仔细控制外围包络，表明大脑中的目标包络编码保真度不仅取决于耳蜗传递的包络，还取决于支持场景分离的时间精细结构（TFS）。我们的结果与以下观点一致，即声音元素在包络和/或 TFS 中的时间相干性影响场景分析和对目标声音的注意力选择。我们的发现也为试图改善现实世界语音通信的语音可懂度模型和技术提供了信息。

相似文献

Modulation masking and fine structure shape neural envelope coding to predict speech intelligibility across diverse listening conditions.调制掩蔽和精细结构形状神经包络编码可预测各种听力条件下的言语可懂度。

J Acoust Soc Am. 2021 Sep;150(3):2230. doi: 10.1121/10.0006385.

Speech Categorization Reveals the Role of Early-Stage Temporal-Coherence Processing in Auditory Scene Analysis.言语分类揭示了早期时间相干性处理在听觉场景分析中的作用。

J Neurosci. 2022 Jan 12;42(2):240-254. doi: 10.1523/JNEUROSCI.1610-21.2021. Epub 2021 Nov 11.

Channel selection in the modulation domain for improved speech intelligibility in noise.在调制域中进行信道选择，以提高噪声环境下的语音可懂度。

J Acoust Soc Am. 2012 Apr;131(4):2904-13. doi: 10.1121/1.3688488.

The effects of the addition of low-level, low-noise noise on the intelligibility of sentences processed to remove temporal envelope information.添加低水平、低噪声对去除时间包络信息后的句子可懂度的影响。

J Acoust Soc Am. 2010 Oct;128(4):2150-61. doi: 10.1121/1.3478773.

The effects of speech masking on neural tracking of acoustic and semantic features of natural speech.语音掩蔽对自然语音的声学和语义特征的神经跟踪的影响。

Neuropsychologia. 2023 Jul 29;186:108584. doi: 10.1016/j.neuropsychologia.2023.108584. Epub 2023 May 9.

The intelligibility of speech in a harmonic masker varying in fundamental frequency contour, broadband temporal envelope, and spatial location.在基频轮廓、宽带时间包络和空间位置变化的谐波掩蔽器中语音的可懂度。

Hear Res. 2017 Jul;350:1-10. doi: 10.1016/j.heares.2017.03.012. Epub 2017 Mar 29.

Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.基于调制频率选择性处理后的信噪比包络功率比预测语音可懂度。

J Acoust Soc Am. 2011 Sep;130(3):1475-87. doi: 10.1121/1.3621502.

Role of Binaural Temporal Fine Structure and Envelope Cues in Cocktail-Party Listening.双耳时间精细结构和包络线索在鸡尾酒会式聆听中的作用。

J Neurosci. 2016 Aug 3;36(31):8250-7. doi: 10.1523/JNEUROSCI.4421-15.2016.

On the balance of envelope and temporal fine structure in the encoding of speech in the early auditory system.在早期听觉系统中语音编码的包络和时序精细结构的平衡。

J Acoust Soc Am. 2013 May;133(5):2818-33. doi: 10.1121/1.4795783.

Distorting temporal fine structure by phase shifting and its effects on speech intelligibility and neural phase locking.通过相移扭曲时间精细结构及其对语音可懂度和神经相位锁定的影响。

Sci Rep. 2017 Oct 17;7(1):13387. doi: 10.1038/s41598-017-12975-3.

引用本文的文献

Evaluating the role of age on speech-in-noise perception based primarily on temporal envelope information.主要基于时间包络信息评估年龄在噪声中言语感知中的作用。

Hear Res. 2025 May;460:109236. doi: 10.1016/j.heares.2025.109236. Epub 2025 Mar 7.

Individual differences elucidate the perceptual benefits associated with robust temporal fine-structure processing.个体差异阐明了与稳健的时间精细结构处理相关的感知益处。

Proc Natl Acad Sci U S A. 2025 Jan 7;122(1):e2317152121. doi: 10.1073/pnas.2317152121. Epub 2025 Jan 3.

Impact of reduced spectral resolution on temporal-coherence-based source segregation.光谱分辨率降低对基于时间相干性的声源分离的影响。

J Acoust Soc Am. 2024 Dec 1;156(6):3862-3876. doi: 10.1121/10.0034545.

Impact of Reduced Spectral Resolution on Temporal-Coherence-Based Source Segregation.光谱分辨率降低对基于时间相干性的声源分离的影响。

bioRxiv. 2024 Mar 13:2024.03.11.584489. doi: 10.1101/2024.03.11.584489.

Individual Differences Elucidate the Perceptual Benefits Associated with Robust Temporal Fine-Structure Processing.个体差异揭示了与稳健的时间精细结构处理相关的感知益处。

bioRxiv. 2024 Jun 21:2023.09.20.558670. doi: 10.1101/2023.09.20.558670.

Induced alpha and beta electroencephalographic rhythms covary with single-trial speech intelligibility in competition.诱导的 alpha 和 beta 脑电图节律与竞争中单试言语可懂度相关。

Sci Rep. 2023 Jun 23;13(1):10216. doi: 10.1038/s41598-023-37173-2.

Web-based psychoacoustics: Hearing screening, infrastructure, and validation.基于网络的心理声学：听力筛查、基础设施和验证。

Behav Res Methods. 2024 Mar;56(3):1433-1448. doi: 10.3758/s13428-023-02101-9. Epub 2023 Jun 8.

Individualized Assays of Temporal Coding in the Ascending Human Auditory System.个体化的人类听觉上行系统时间编码分析。

eNeuro. 2022 Mar 11;9(2). doi: 10.1523/ENEURO.0378-21.2022. Print 2022 Mar-Apr.

Speech Categorization Reveals the Role of Early-Stage Temporal-Coherence Processing in Auditory Scene Analysis.言语分类揭示了早期时间相干性处理在听觉场景分析中的作用。

J Neurosci. 2022 Jan 12;42(2):240-254. doi: 10.1523/JNEUROSCI.1610-21.2021. Epub 2021 Nov 11.

Temporal fine structure influences voicing confusions for consonant identification in multi-talker babble.时频结构对多说话人噪声环境下辅音识别的浊音混淆有影响。

J Acoust Soc Am. 2021 Oct;150(4):2664. doi: 10.1121/10.0006527.

本文引用的文献

Pre- and post-target cortical processes predict speech-in-noise performance.目标前和目标后皮质过程可预测语音噪声下的言语表现。

Neuroimage. 2021 Mar;228:117699. doi: 10.1016/j.neuroimage.2020.117699. Epub 2020 Dec 30.

Predicting the effects of periodicity on the intelligibility of masked speech: An evaluation of different modelling approaches and their limitations.预测周期性对掩蔽语音可懂度的影响：不同建模方法的评估及其局限性。

J Acoust Soc Am. 2019 Oct;146(4):2562. doi: 10.1121/1.5129050.

Electroencephalographic Signatures of the Neural Representation of Speech during Selective Attention.选择性注意时言语神经表象的脑电图特征。

eNeuro. 2019 Oct 31;6(5). doi: 10.1523/ENEURO.0057-19.2019. Print 2019 Sep/Oct.

Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离：综述

IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.

Non-Invasive Assays of Cochlear Synaptopathy - Candidates and Considerations.耳蜗突触病的无创性检测——候选物和考虑因素。

Neuroscience. 2019 May 21;407:53-66. doi: 10.1016/j.neuroscience.2019.02.031. Epub 2019 Mar 8.

A Comparison of Regularization Methods in Forward and Backward Models for Auditory Attention Decoding.听觉注意力解码的前向和后向模型中正则化方法的比较

Front Neurosci. 2018 Aug 7;12:531. doi: 10.3389/fnins.2018.00531. eCollection 2018.

Cortical Measures of Phoneme-Level Speech Encoding Correlate with the Perceived Clarity of Natural Speech.皮层水平语音编码的测度与自然语音感知清晰度相关。

eNeuro. 2018 Apr 16;5(2). doi: 10.1523/ENEURO.0084-18.2018. eCollection 2018 Mar-Apr.

Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope.基于语音包络神经同步预测语音可懂度。

J Assoc Res Otolaryngol. 2018 Apr;19(2):181-191. doi: 10.1007/s10162-018-0654-z. Epub 2018 Feb 20.

Neural source dynamics of brain responses to continuous stimuli: Speech processing from acoustics to comprehension.连续刺激下大脑反应的神经源动力学：从声学处理到理解的言语加工。

Neuroimage. 2018 May 15;172:162-174. doi: 10.1016/j.neuroimage.2018.01.042. Epub 2018 Feb 3.

Causal cortical dynamics of a predictive enhancement of speech intelligibility.因果皮层动态预测增强言语可懂度。

Neuroimage. 2018 Feb 1;166:247-258. doi: 10.1016/j.neuroimage.2017.10.066. Epub 2017 Nov 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验