Williamson Donald S, Wang DeLiang
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA.
Department of Computer Science and Engineering, Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA.
IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.
In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.
在现实世界的场景中,语音会被背景噪声和混响所掩盖,这会对感知质量和可懂度产生负面影响。在本文中,我们致力于解决混响和嘈杂环境中的单声道语音分离问题。我们使用深度神经网络进行监督学习来执行去混响和降噪。具体而言,我们通过使用复理想比率掩码估计进行分离来增强幅度和相位。我们定义复理想比率掩码,以便在将掩码应用于混响和嘈杂语音后得到直达语音。我们的方法使用模拟和真实房间脉冲响应以及背景噪声进行评估。所提出的方法显著提高了客观语音质量和可懂度。评估和比较表明,在许多混响和嘈杂环境中,它优于相关方法。