Zhang Hao, Wang DeLiang
Department of Computer Science and Engineering, The Ohio State University, USA.
Center for Cognitive and Brain Sciences, The Ohio State University, USA.
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:671-675. doi: 10.1109/icassp43922.2022.9747445. Epub 2022 Apr 27.
In this paper, we propose a neural cascade architecture for joint acoustic echo and noise suppression. The proposed cascade architecture consists of two modules. A convolutional recurrent network (CRN) is employed in the first module for complex spectral mapping. The output is then fed as an additional input to the second module, where a long short-term memory network (LSTM) is utilized for magnitude mask estimation. The entire architecture is trained in an end-to-end manner with the two modules optimized jointly using a single loss function. The final output is generated using the enhanced phase and magnitude obtained from the first and the second module, respectively. The cascade architecture enables the proposed method to obtain robust magnitude estimation as well as phase enhancement. Evaluation results show that the proposed method effectively suppresses acoustic echo and noise while preserving good speech quality, and significantly outperforms related methods.
在本文中,我们提出了一种用于联合声学回声和噪声抑制的神经级联架构。所提出的级联架构由两个模块组成。第一个模块采用卷积循环网络(CRN)进行复谱映射。然后,输出作为额外输入被馈送到第二个模块,在该模块中使用长短期记忆网络(LSTM)进行幅度掩码估计。整个架构以端到端的方式进行训练,两个模块使用单个损失函数进行联合优化。最终输出分别使用从第一个和第二个模块获得的增强相位和幅度生成。级联架构使所提出的方法能够获得稳健的幅度估计以及相位增强。评估结果表明,所提出的方法在保持良好语音质量的同时有效地抑制了声学回声和噪声,并且显著优于相关方法。