Suppr超能文献

基于深度学习的双麦克风手机实时语音增强

Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.

作者信息

Tan Ke, Zhang Xueliang, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210-1277 USA.

Department of Computer Science, Inner Mongolia University, Hohhot 010021, China.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1853-1863. doi: 10.1109/taslp.2021.3082318. Epub 2021 May 21.

Abstract

In mobile speech communication, speech signals can be severely corrupted by background noise when the far-end talker is in a noisy acoustic environment. To suppress background noise, speech enhancement systems are typically integrated into mobile phones, in which one or more microphones are deployed. In this study, we propose a novel deep learning based approach to real-time speech enhancement for dual-microphone mobile phones. The proposed approach employs a new densely-connected convolutional recurrent network to perform dual-channel complex spectral mapping. We utilize a structured pruning technique to compress the model without significantly degrading the enhancement performance, which yields a low-latency and memory-efficient enhancement system for real-time processing. Experimental results suggest that the proposed approach consistently outperforms an earlier approach to dual-channel speech enhancement for mobile phone communication, as well as a deep learning based beamformer.

摘要

在移动语音通信中,当远端讲话者处于嘈杂的声学环境时,语音信号会受到背景噪声的严重干扰。为了抑制背景噪声,语音增强系统通常集成到手机中,手机中部署了一个或多个麦克风。在本研究中,我们提出了一种基于深度学习的新颖方法,用于双麦克风手机的实时语音增强。所提出的方法采用了一种新的密集连接卷积循环网络来执行双通道复谱映射。我们利用一种结构化剪枝技术来压缩模型,而不会显著降低增强性能,从而产生一个低延迟且内存高效的增强系统用于实时处理。实验结果表明,所提出的方法始终优于早期用于手机通信的双通道语音增强方法以及基于深度学习的波束形成器。

相似文献

1
Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.基于深度学习的双麦克风手机实时语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1853-1863. doi: 10.1109/taslp.2021.3082318. Epub 2021 May 21.

引用本文的文献

2
Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.基于级联架构的噪声语音估计与浊音检测
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:3760-3770. doi: 10.1109/TASLP.2023.3313427. Epub 2023 Sep 13.
4
ATTENTION-BASED FUSION FOR BONE-CONDUCTED AND AIR-CONDUCTED SPEECH ENHANCEMENT IN THE COMPLEX DOMAIN.复杂域中基于注意力的骨传导和声传导语音增强融合
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7757-7761. doi: 10.1109/icassp43922.2022.9746374. Epub 2022 Apr 27.
6
Fusing Bone-conduction and Air-conduction Sensors for Complex-Domain Speech Enhancement.融合骨传导与空气传导传感器用于复域语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:3134-3143. doi: 10.1109/taslp.2022.3209943. Epub 2022 Sep 26.

本文引用的文献

2
UNet++: A Nested U-Net Architecture for Medical Image Segmentation.U-Net++:一种用于医学图像分割的嵌套U-Net架构。
Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018). 2018 Sep;11045:3-11. doi: 10.1007/978-3-030-00889-5_1. Epub 2018 Sep 20.
3
Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.用于单声道语音增强的带扩张卷积的门控残差网络
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):189-198. doi: 10.1109/TASLP.2018.2876171. Epub 2018 Oct 15.
4
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
6
Deep Learning Based Binaural Speech Separation in Reverberant Environments.基于深度学习的混响环境下双耳语音分离
IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):1075-1084. doi: 10.1109/TASLP.2017.2687104. Epub 2017 Mar 24.
7
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
8
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验