Suppr超能文献

中级语音和噪声统计特性的干扰是自然环境噪声中人类语音识别敏感性的基础。

Interference of mid-level speech and noise statistics underlies human speech recognition sensitivity in natural environmental noise.

作者信息

Clonan Alex C, Zhai Xiu, Stevenson Ian H, Escabí Monty A

机构信息

Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269.

Biomedical Engineering, University of Connecticut, Storrs, CT 06269.

出版信息

J Neurosci. 2025 Jul 8. doi: 10.1523/JNEUROSCI.1751-24.2025.

Abstract

Recognizing speech in noise, such as in a busy restaurant, is an essential cognitive skill where the task difficulty varies across environments and noise levels. Although there is growing evidence that the auditory system relies on statistical representations for perceiving and coding natural sounds, it's less clear how statistical cues and neural representations contribute to segregating speech in natural auditory scenes. Here we demonstrate that male and female human listeners rely on mid-level statistics to segregate and recognize speech in environmental noise. Using natural backgrounds and variants with perturbed spectrotemporal statistics, we show that speech recognition accuracy at a fixed noise level varies extensively across natural backgrounds (0% to 100%). Furthermore, for each background the unique interference created by summary statistics can mask or unmask speech, thus hindering or improving speech recognition. To identify the neural coding strategy and statistical cues that influence accuracy, we developed a framework that links summary statistics from a neural model to word recognition accuracy. Whereas summary statistics from a peripheral cochlear model account for only 60% of perceptual variance, summary statistics from a mid-level auditory midbrain model accurately predict single trial sensory judgments, accounting for more than 90% of the perceptual variance. Furthermore, perceptual weights from the regression framework identify which statistics and tuned neural filters are influential and how they impact recognition. Thus, perception of speech in natural backgrounds relies on a mid-level auditory representation involving interference of multiple summary statistics that impact recognition beneficially or detrimentally across natural background sounds. Recognizing speech in natural auditory scenes with competing talkers and environmental noise is a critical cognitive skill. Although normal listeners effortlessly perform this task, for instance in a crowded restaurant, it challenges individuals with hearing loss and our most sophisticated machine systems. We tested human participants listening to speech in natural noises with varied statistical characteristics and demonstrate that they rely on a statistical representation of sounds to segregate speech from environmental noise. Using a model of the auditory system, we then demonstrate that a brain inspired statistical representation of natural sounds accurately predicts human perceptual trends across wide range of natural backgrounds and noise levels and reveals key statistical features and neural computations underlying human abilities for this task.

摘要

在嘈杂环境中识别语音,比如在繁忙的餐厅里,是一项重要的认知技能,其任务难度会因环境和噪音水平的不同而变化。尽管越来越多的证据表明,听觉系统依靠统计表征来感知和编码自然声音,但统计线索和神经表征如何在自然听觉场景中分离语音,却不太清楚。在这里,我们证明了男性和女性人类听众依靠中级统计来在环境噪音中分离和识别语音。使用自然背景以及频谱时间统计受到干扰的变体,我们表明在固定噪音水平下,语音识别准确率在不同自然背景中差异很大(从0%到100%)。此外,对于每种背景,由汇总统计产生的独特干扰会掩盖或揭示语音,从而阻碍或提高语音识别。为了确定影响准确率的神经编码策略和统计线索,我们开发了一个框架,将神经模型的汇总统计与单词识别准确率联系起来。虽然外周耳蜗模型的汇总统计仅占感知方差的60%,但中级听觉中脑模型的汇总统计能准确预测单次试验的感官判断,占感知方差的90%以上。此外,回归框架的感知权重确定了哪些统计和调谐神经滤波器具有影响力,以及它们如何影响识别。因此,在自然背景中对语音的感知依赖于中级听觉表征,该表征涉及多种汇总统计的干扰,这些干扰在自然背景声音中对识别产生有益或有害的影响。在有竞争谈话声和环境噪音的自然听觉场景中识别语音是一项关键的认知技能。尽管正常听众能轻松完成这项任务,比如在拥挤的餐厅里,但这对听力损失者和最先进的机器系统来说却是一项挑战。我们测试了人类参与者在具有不同统计特征的自然噪音中听语音的情况,并证明他们依靠声音的统计表征来将语音与环境噪音分离。然后,我们使用听觉系统模型证明,受大脑启发的自然声音统计表征能准确预测人类在广泛的自然背景和噪音水平下的感知趋势,并揭示了人类完成这项任务能力背后的关键统计特征和神经计算。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验