Department of Mathematics, Statistics and Computer Science, The Ohio State University at Lima, Lima, Ohio 45804, USA.
J Acoust Soc Am. 2011 Oct;130(4):2153-61. doi: 10.1121/1.3631668.
For a mixture of target speech and noise in anechoic conditions, the ideal binary mask is defined as follows: It selects the time-frequency units where target energy exceeds noise energy by a certain local threshold and cancels the other units. In this study, the definition of the ideal binary mask is extended to reverberant conditions. Given the division between early and late reflections in terms of speech intelligibility, three ideal binary masks can be defined: an ideal binary mask that uses the direct path of the target as the desired signal, an ideal binary mask that uses the direct path and early reflections of the target as the desired signal, and an ideal binary mask that uses the reverberant target as the desired signal. The effects of these ideal binary mask definitions on speech intelligibility are compared across two types of interference: speech shaped noise and concurrent female speech. As suggested by psychoacoustical studies, the ideal binary mask based on the direct path and early reflections of target speech outperforms the other masks as reverberation time increases and produces substantial reductions in terms of speech reception threshold for normal hearing listeners.
在消声环境中,对于目标语音和噪声的混合,理想的二进制掩蔽定义如下:它选择目标能量超过噪声能量的时频单元,并对其他单元进行抑制。在本研究中,将理想二进制掩蔽的定义扩展到混响环境。根据语音可懂度对早期反射和晚期反射进行区分,可以定义三种理想的二进制掩蔽:一种使用目标的直达路径作为期望信号的理想二进制掩蔽,一种使用目标的直达路径和早期反射作为期望信号的理想二进制掩蔽,以及一种使用混响目标作为期望信号的理想二进制掩蔽。在两种类型的干扰下,即语音噪声和同时的女性语音,比较了这些理想二进制掩蔽定义对语音可懂度的影响。根据心理声学研究的结果,随着混响时间的增加,基于目标语音的直达路径和早期反射的理想二进制掩蔽比其他掩蔽表现更好,并为正常听力的听众降低了语音接收阈值。