Wang Heming, Wang DeLiang
Department of Computer Science and Engineering, The Ohio State University, USA.
Center for Cognitive and Brain Sciences, The Ohio State University, USA.
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7862-7866. doi: 10.1109/icassp43922.2022.9747752. Epub 2022 Apr 27.
This paper proposes a novel cascade architecture to address the monaural speech enhancement problem. We leverage three different domains of speech representation, namely spectral magnitude, waveform, and complex spectrogram, to progressively suppress the background noise within noisy speech. Our proposed neural cascade architecture consists of three modules, and each operates on the original noisy input and the output of the previous module in a distinct speech representation. During training, the network simultaneously optimizes all modules with a triple-domain loss. Experiments on the WSJ0 SI-84 corpus demonstrate that our proposed approach achieves superior enhancement results, and substantially outperforms previous baselines in terms of both speech quality and intelligibility.
本文提出了一种新颖的级联架构来解决单声道语音增强问题。我们利用语音表示的三个不同域,即频谱幅度、波形和复谱图,逐步抑制噪声语音中的背景噪声。我们提出的神经级联架构由三个模块组成,每个模块在不同的语音表示中对原始噪声输入和前一个模块的输出进行操作。在训练过程中,网络使用三域损失同时优化所有模块。在WSJ0 SI - 84语料库上的实验表明,我们提出的方法取得了优异的增强效果,并且在语音质量和可懂度方面均显著优于先前的基线方法。