National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, People's Republic of China
J Acoust Soc Am. 2013 Nov;134(5):EL452-8. doi: 10.1121/1.4824632.
In this paper, a computational goal for a monaural speech separation system is proposed. Since this goal is derived by maximizing the signal-to-noise ratio (SNR), it is called the optimal ratio mask (ORM). Under the approximate W-Disjoint Orthogonality assumption which almost always holds due to the sparse nature of speech, theoretical analysis shows that the ORM can improve the SNR about 10log(10)2 dB over the ideal ratio mask. With three kinds of real-world interference, the speech separation results of SNR gain and objective quality evaluation demonstrate the correctness of the theoretical analysis, and imply that the ORM achieves a better separation performance.
本文提出了一种用于单声道语音分离系统的计算目标。由于该目标是通过最大化信噪比(SNR)来推导的,因此称为最优比掩蔽(ORM)。在近似 W-不相交正交性假设下,由于语音的稀疏性,该假设几乎总是成立,理论分析表明,在理想比掩蔽的基础上,ORM 可以将 SNR 提高约 10log(10)2dB。通过三种真实世界的干扰,信噪比增益和客观质量评估的语音分离结果证明了理论分析的正确性,并表明 ORM 实现了更好的分离性能。