Department of Psychology, Utah State University, 2810 Old Main Hill, Logan, Utah 84322-2810, USA.
J Acoust Soc Am. 2013 Apr;133(4):2390-6. doi: 10.1121/1.4792143.
Binary time-frequency (TF) masks can be applied to separate speech from noise. Previous studies have shown that with appropriate parameters, ideal TF masks can extract highly intelligible speech even at very low speech-to-noise ratios (SNRs). Two psychophysical experiments provided additional information about the dependence of intelligibility on the frequency resolution and threshold criteria that define the ideal TF mask. Listeners identified AzBio Sentences in noise, before and after application of TF masks. Masks generated with 8 or 16 frequency bands per octave supported nearly-perfect identification. Word recognition accuracy was slightly lower and more variable with 4 bands per octave. When TF masks were generated with a local threshold criterion of 0 dB SNR, the mean speech reception threshold was -9.5 dB SNR, compared to -5.7 dB for unprocessed sentences in noise. Speech reception thresholds decreased by about 1 dB per dB of additional decrease in the local threshold criterion. Information reported here about the dependence of speech intelligibility on frequency and level parameters has relevance for the development of non-ideal TF masks for clinical applications such as speech processing for hearing aids.
二进制时频 (TF) 掩码可用于分离语音和噪声。先前的研究表明,使用适当的参数,理想的 TF 掩码即使在非常低的语音噪声比 (SNR) 下也可以提取出高度可理解的语音。两项心理物理实验提供了有关可懂度对频率分辨率和定义理想 TF 掩码的阈值标准的依赖性的更多信息。在应用 TF 掩码之前和之后,听众在噪声中识别了 AzBio 句子。每八度 8 或 16 个频带生成的掩码支持近乎完美的识别。每八度 4 个频带的识别准确率略低,且变化更大。当使用 0 dB SNR 的局部阈值标准生成 TF 掩码时,平均言语接受阈值为-9.5 dB SNR,而未经处理的噪声中的句子为-5.7 dB SNR。随着局部阈值标准额外降低 1 dB,言语接受阈值降低约 1 dB。这里报告的关于语音可懂度对频率和水平参数的依赖性的信息对于为助听器等临床应用开发非理想 TF 掩码具有重要意义。