Suppr超能文献

噪声中的语音识别:估计基底膜反应中压缩非线性的影响。

Speech recognition in noise: estimating effects of compressive nonlinearities in the basilar-membrane response.

作者信息

Horwitz Amy R, Ahlstrom Jayne B, Dubno Judy R

机构信息

Medical University of South Carolina, Charleston, SC 29425, USA.

出版信息

Ear Hear. 2007 Sep;28(5):682-93. doi: 10.1097/AUD.0b013e31812f7156.

Abstract

OBJECTIVES

This experiment was designed to estimate effects of cochlear nonlinearities on tonal and speech masking for individuals with normal hearing who have a range of quiet thresholds. Physiological and psychophysical evidence indicates that for signals close to the characteristic frequency (CF) of a place on the basilar membrane, the normal growth of response of the basilar membrane is linear at lower stimulus levels and compressed at medium to higher stimulus levels. In contrast, at moderate to high CFs, the basilar membrane responds more linearly to stimuli at frequencies well below the CF regardless of input level. Thus, the hypothesis tested was that masker effectiveness would change as a function of stimulus level consistent with the underlying basilar membrane response. Specifically, with a fixed-level speech signal and a speech-shaped masker that ranges from low to higher levels, the resulting response of the basilar membrane to the masker would be linear at lower levels and compressed at medium to higher levels. This would result in relatively less effective masking at higher masker levels. It was further hypothesized that the transition from linear to compressed responses to both tones and maskers would occur at higher levels for listeners with higher quiet thresholds than for listeners with lower quiet thresholds.

DESIGN

Tonal thresholds and speech recognition in noise were measured as a function of masker level. A 10-msec, 2.0-kHz tone was presented in a lower frequency masker ranging from 40 to 85 dB SPL. Moderate-level speech was presented in interrupted noise at six levels ranging from 47 to 77 dB SPL. To minimize differences in speech audibility that could arise during the "off" periods of the interrupted noise, a low-level steady-state "threshold-matching noise" was also present during measurement of speech recognition. Subjects were 30 adults with normal hearing with a 20-dB range of average quiet thresholds.

RESULTS

Tonal breakpoints (i.e., the levels corresponding to the transitions from linear to nonlinear responses) were significantly correlated with quiet thresholds, whereas slopes measured above the breakpoints were not. Speech recognition in noise was consistent with the hypothesis that the response of the basilar membrane to the masker was linear at lower levels and compressed at medium to higher levels, resulting in less effective masking at higher masker levels. That is, at lower masker levels, as masker level increased, mean observed speech scores declined as predicted using the articulation index, an audibility-based model. With further increases in masker level, mean scores declined less than predicted. Moreover, for subjects with higher quiet thresholds, masker effectiveness remained constant for a wider range of masker levels than for subjects with lower quiet thresholds, consistent with the hypothesis that the transition from linear to compressed responses occurred at higher levels. Finally, significant negative correlations were obtained between individual subjects' tonal and speech measures.

CONCLUSIONS

Results from tonal and speech tasks were consistent with basilar membrane nonlinearities and consistent with changes in nonlinearities with minor threshold elevations, providing support for their role in the understanding of speech in noise with increases in noise level.

摘要

目的

本实验旨在评估耳蜗非线性对具有一系列安静阈值的正常听力个体的音调及言语掩蔽的影响。生理和心理物理学证据表明,对于接近基底膜上某一位置特征频率(CF)的信号,在较低刺激水平时基底膜反应的正常增长呈线性,而在中等至高刺激水平时则被压缩。相比之下,在中等至高CF时,无论输入水平如何,基底膜对远低于CF频率的刺激反应更呈线性。因此,所检验的假设是,掩蔽器的有效性会随着刺激水平的变化而改变,这与潜在的基底膜反应一致。具体而言,对于固定水平的言语信号和从低到高变化的言语形状掩蔽器,基底膜对掩蔽器的反应在较低水平时呈线性,在中等至高水平时被压缩。这将导致在较高掩蔽器水平时掩蔽效果相对较差。进一步假设,与安静阈值较低的听众相比,安静阈值较高的听众对音调和掩蔽器的反应从线性转变为压缩的水平会更高。

设计

测量音调阈值和噪声中的言语识别作为掩蔽器水平的函数。在40至85 dB SPL的较低频率掩蔽器中呈现10毫秒、2.0千赫的音调。在47至77 dB SPL的六个水平的间断噪声中呈现中等水平的言语。为了最小化在间断噪声的“关闭”期间可能出现的言语可听度差异,在言语识别测量期间也存在低水平稳态“阈值匹配噪声”。受试者为30名平均安静阈值范围为20 dB的正常听力成年人。

结果

音调断点(即对应于从线性反应到非线性反应转变的水平)与安静阈值显著相关,而在断点之上测量的斜率则不然。噪声中的言语识别与以下假设一致,即基底膜对掩蔽器在较低水平时的反应呈线性,在中等至高水平时被压缩,导致在较高掩蔽器水平时掩蔽效果较差。也就是说,在较低掩蔽器水平时,随着掩蔽器水平增加,观察到的平均言语得分如使用基于可听度的清晰度指数模型预测的那样下降。随着掩蔽器水平进一步增加,平均得分下降幅度小于预测值。此外,对于安静阈值较高的受试者,掩蔽器有效性在比安静阈值较低的受试者更宽的掩蔽器水平范围内保持恒定,这与从线性反应到压缩反应的转变在较高水平发生的假设一致。最后,在个体受试者的音调和言语测量之间获得了显著的负相关。

结论

音调和言语任务的结果与基底膜非线性一致,并且与随着阈值轻微升高非线性的变化一致,为其在理解噪声中言语随着噪声水平增加所起的作用提供了支持。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验