Li Ning, Loizou Philipos C
Department of Electrical Engineering, University of Texas at Dallas, Richardson, Texas 75083-0688, USA.
J Acoust Soc Am. 2008 Mar;123(3):1673-82. doi: 10.1121/1.2832617.
The application of the ideal binary mask to an auditory mixture has been shown to yield substantial improvements in intelligibility. This mask is commonly applied to the time-frequency (T-F) representation of a mixture signal and eliminates portions of a signal below a signal-to-noise-ratio (SNR) threshold while allowing others to pass through intact. The factors influencing intelligibility of ideal binary-masked speech are not well understood and are examined in the present study. Specifically, the effects of the local SNR threshold, input SNR level, masker type, and errors introduced in estimating the ideal mask are examined. Consistent with previous studies, intelligibility of binary-masked stimuli is quite high even at -10 dB SNR for all maskers tested. Performance was affected the most when the masker dominated T-F units were wrongly labeled as target-dominated T-F units. Performance plateaued near 100% correct for SNR thresholds ranging from -20 to 5 dB. The existence of the plateau region suggests that it is the pattern of the ideal binary mask that matters the most rather than the local SNR of each T-F unit. This pattern directs the listener's attention to where the target is and enables them to segregate speech effectively in multitalker environments.
将理想二元掩蔽应用于听觉混合信号已被证明能显著提高可懂度。这种掩蔽通常应用于混合信号的时频(T-F)表示,它会消除信号中低于信噪比(SNR)阈值的部分,同时让其他部分完整通过。影响理想二元掩蔽语音可懂度的因素尚未得到很好的理解,本研究对此进行了探讨。具体而言,研究了局部SNR阈值、输入SNR水平、掩蔽类型以及估计理想掩蔽时引入的误差的影响。与先前的研究一致,对于所有测试的掩蔽,即使在SNR为-10 dB时,二元掩蔽刺激的可懂度也相当高。当掩蔽主导的T-F单元被错误标记为目标主导的T-F单元时,性能受到的影响最大。对于SNR阈值从-20 dB到5 dB的范围,性能在接近100%正确时趋于平稳。平稳区域的存在表明,最重要的是理想二元掩蔽的模式,而不是每个T-F单元的局部SNR。这种模式将听众的注意力引向目标所在位置,并使他们能够在多说话者环境中有效地分离语音。