Tan Ke, Zhang Xueliang, Wang DeLiang
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210-1277 USA.
Department of Computer Science, Inner Mongolia University, Hohhot 010021, China.
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1853-1863. doi: 10.1109/taslp.2021.3082318. Epub 2021 May 21.
In mobile speech communication, speech signals can be severely corrupted by background noise when the far-end talker is in a noisy acoustic environment. To suppress background noise, speech enhancement systems are typically integrated into mobile phones, in which one or more microphones are deployed. In this study, we propose a novel deep learning based approach to real-time speech enhancement for dual-microphone mobile phones. The proposed approach employs a new densely-connected convolutional recurrent network to perform dual-channel complex spectral mapping. We utilize a structured pruning technique to compress the model without significantly degrading the enhancement performance, which yields a low-latency and memory-efficient enhancement system for real-time processing. Experimental results suggest that the proposed approach consistently outperforms an earlier approach to dual-channel speech enhancement for mobile phone communication, as well as a deep learning based beamformer.
在移动语音通信中,当远端讲话者处于嘈杂的声学环境时,语音信号会受到背景噪声的严重干扰。为了抑制背景噪声,语音增强系统通常集成到手机中,手机中部署了一个或多个麦克风。在本研究中,我们提出了一种基于深度学习的新颖方法,用于双麦克风手机的实时语音增强。所提出的方法采用了一种新的密集连接卷积循环网络来执行双通道复谱映射。我们利用一种结构化剪枝技术来压缩模型,而不会显著降低增强性能,从而产生一个低延迟且内存高效的增强系统用于实时处理。实验结果表明,所提出的方法始终优于早期用于手机通信的双通道语音增强方法以及基于深度学习的波束形成器。