Suppr超能文献

复域中的时频掩蔽用于语音去混响和降噪

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.

作者信息

Williamson Donald S, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA.

Department of Computer Science and Engineering, Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.

Abstract

In real-world situations, speech is masked by both background noise and reverberation, which negatively affect perceptual quality and intelligibility. In this paper, we address monaural speech separation in reverberant and noisy environments. We perform dereverberation and denoising using supervised learning with a deep neural network. Specifically, we enhance the magnitude and phase by performing separation with an estimate of the complex ideal ratio mask. We define the complex ideal ratio mask so that direct speech results after the mask is applied to reverberant and noisy speech. Our approach is evaluated using simulated and real room impulse responses, and with background noises. The proposed approach improves objective speech quality and intelligibility significantly. Evaluations and comparisons show that it outperforms related methods in many reverberant and noisy environments.

摘要

在现实世界的场景中,语音会被背景噪声和混响所掩盖,这会对感知质量和可懂度产生负面影响。在本文中,我们致力于解决混响和嘈杂环境中的单声道语音分离问题。我们使用深度神经网络进行监督学习来执行去混响和降噪。具体而言,我们通过使用复理想比率掩码估计进行分离来增强幅度和相位。我们定义复理想比率掩码,以便在将掩码应用于混响和嘈杂语音后得到直达语音。我们的方法使用模拟和真实房间脉冲响应以及背景噪声进行评估。所提出的方法显著提高了客观语音质量和可懂度。评估和比较表明,在许多混响和嘈杂环境中,它优于相关方法。

相似文献

1
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.复域中的时频掩蔽用于语音去混响和降噪
IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.
2
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.用于噪声混响语音增强的两阶段深度学习
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.
5
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
6
Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.
8
Deep Learning Based Binaural Speech Separation in Reverberant Environments.基于深度学习的混响环境下双耳语音分离
IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):1075-1084. doi: 10.1109/TASLP.2017.2687104. Epub 2017 Mar 24.

引用本文的文献

7
Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.
9
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.用于噪声混响语音增强的两阶段深度学习
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.

本文引用的文献

1
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
2
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.
6
10

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验