Wang DeLiang, Kjems Ulrik, Pedersen Michael S, Boldt Jesper B, Lunner Thomas
Department of Computer Science & Engineering, and Center for Cognitive Science, The Ohio State University, Columbus, Ohio 43210, USA.
J Acoust Soc Am. 2008 Oct;124(4):2303-7. doi: 10.1121/1.2967865.
For a given mixture of speech and noise, an ideal binary time-frequency mask is constructed by comparing speech energy and noise energy within local time-frequency units. It is observed that listeners achieve nearly perfect speech recognition from gated noise with binary gains prescribed by the ideal binary mask. Only 16 filter channels and a frame rate of 100 Hz are sufficient for high intelligibility. The results show that, despite a dramatic reduction of speech information, a pattern of binary gains provides an adequate basis for speech perception.
对于给定的语音和噪声混合信号,通过比较局部时频单元内的语音能量和噪声能量来构建理想的二元时频掩蔽。据观察,听众使用由理想二元掩蔽规定的二元增益,能从门控噪声中实现近乎完美的语音识别。仅16个滤波器通道和100Hz的帧率就足以实现高清晰度。结果表明,尽管语音信息大幅减少,但二元增益模式为语音感知提供了充分的基础。