Suppr超能文献

基于深度学习的语音去混响目标消除

Deep Learning Based Target Cancellation for Speech Dereverberation.

作者信息

Wang Zhong-Qiu, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210-1277 USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210-1277 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.

Abstract

This study investigates deep learning based single- and multi-channel speech dereverberation. For single-channel processing, we extend magnitude-domain masking and mapping based dereverberation to complex-domain mapping, where deep neural networks (DNNs) are trained to predict the real and imaginary (RI) components of the direct-path signal from reverberant (and noisy) ones. For multi-channel processing, we first compute a minimum variance distortionless response (MVDR) beamformer to cancel the direct-path signal, and then feed the RI components of the cancelled signal, which is expected to be a filtered version of non-target signals, as additional features to perform dereverberation. Trained on a large dataset of simulated room impulse responses, our models show excellent speech dereverberation and recognition performance on the test set of the REVERB challenge, consistently better than single- and multi-channel weighted prediction error (WPE) algorithms.

摘要

本研究调查基于深度学习的单通道和多通道语音去混响。对于单通道处理,我们将基于幅度域掩蔽和映射的去混响扩展到复域映射,其中深度神经网络(DNN)经过训练,从混响(和有噪声的)信号中预测直达路径信号的实部和虚部(RI)分量。对于多通道处理,我们首先计算最小方差无失真响应(MVDR)波束形成器以消除直达路径信号,然后将消除后的信号的RI分量(预计为非目标信号的滤波版本)作为额外特征来进行去混响。在大量模拟房间脉冲响应数据集上进行训练后,我们的模型在REVERB挑战赛的测试集上展现出出色的语音去混响和识别性能,始终优于单通道和多通道加权预测误差(WPE)算法。

相似文献

1
Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.
2
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.用于逐话语和连续语音分离的多麦克风复谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:2001-2014. doi: 10.1109/taslp.2021.3083405. Epub 2021 May 26.
5
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.复域中的时频掩蔽用于语音去混响和降噪
IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.

本文引用的文献

2
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
3
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.复域中的时频掩蔽用于语音去混响和降噪
IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.
4
Deep Learning Based Binaural Speech Separation in Reverberant Environments.基于深度学习的混响环境下双耳语音分离
IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):1075-1084. doi: 10.1109/TASLP.2017.2687104. Epub 2017 Mar 24.
5
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
6
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.
7
Binaural segregation in multisource reverberant environments.多源混响环境中的双耳分离
J Acoust Soc Am. 2006 Dec;120(6):4040-51. doi: 10.1121/1.2355480.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验