Suppr超能文献

基于深度学习的语音增强跨语料库泛化研究

On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.

作者信息

Pandey Ashutosh, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2489-2499. doi: 10.1109/taslp.2020.3016487. Epub 2020 Aug 14.

Abstract

In recent years, supervised approaches using deep neural networks (DNNs) have become the mainstream for speech enhancement. It has been established that DNNs generalize well to untrained noises and speakers if trained using a large number of noises and speakers. However, we find that DNNs fail to generalize to new speech corpora in low signal-to-noise ratio (SNR) conditions. In this work, we establish that the lack of generalization is mainly due to the channel mismatch, i.e. different recording conditions between the trained and untrained corpus. Additionally, we observe that traditional channel normalization techniques are not effective in improving cross-corpus generalization. Further, we evaluate publicly available datasets that are promising for generalization. We find one particular corpus to be significantly better than others. Finally, we find that using a smaller frame shift in short-time processing of speech can significantly improve cross-corpus generalization. The proposed techniques to address cross-corpus generalization include channel normalization, better training corpus, and smaller frame shift in short-time Fourier transform (STFT). These techniques together improve the objective intelligibility and quality scores on untrained corpora significantly.

摘要

近年来,使用深度神经网络(DNN)的监督方法已成为语音增强的主流。已经证实,如果使用大量噪声和说话者进行训练,DNN能够很好地推广到未训练的噪声和说话者。然而,我们发现DNN在低信噪比(SNR)条件下无法推广到新的语音语料库。在这项工作中,我们确定缺乏泛化能力主要是由于通道失配,即训练语料库和未训练语料库之间的不同录音条件。此外,我们观察到传统的通道归一化技术在改善跨语料库泛化方面并不有效。此外,我们评估了有望实现泛化的公开可用数据集。我们发现一个特定的语料库明显优于其他语料库。最后,我们发现在语音的短时处理中使用较小的帧移可以显著提高跨语料库的泛化能力。所提出的解决跨语料库泛化的技术包括通道归一化、更好的训练语料库以及短时傅里叶变换(STFT)中较小的帧移。这些技术共同显著提高了未训练语料库上的客观可懂度和质量得分。

相似文献

1
On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.基于深度学习的语音增强跨语料库泛化研究
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2489-2499. doi: 10.1109/taslp.2020.3016487. Epub 2020 Aug 14.
2
Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.用于语音增强以提高跨语料库泛化能力的自关注循环神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:1374-1385. doi: 10.1109/taslp.2022.3161143. Epub 2022 Mar 22.
6
Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.用于单声道语音增强的带扩张卷积的门控残差网络
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):189-198. doi: 10.1109/TASLP.2018.2876171. Epub 2018 Oct 15.
7
An Optimal Transport Analysis on Generalization in Deep Learning.深度学习中的泛化的最优传输分析。
IEEE Trans Neural Netw Learn Syst. 2023 Jun;34(6):2842-2853. doi: 10.1109/TNNLS.2021.3109942. Epub 2023 Jun 1.

引用本文的文献

5
Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.用于语音增强以提高跨语料库泛化能力的自关注循环神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:1374-1385. doi: 10.1109/taslp.2022.3161143. Epub 2022 Mar 22.
8
Towards Robust Speech Super-resolution.迈向稳健的语音超分辨率
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:2058-2066. doi: 10.1109/taslp.2021.3054302. Epub 2021 Jan 25.
9
Dense CNN with Self-Attention for Time-Domain Speech Enhancement.用于时域语音增强的带自注意力机制的密集卷积神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1270-1279. doi: 10.1109/taslp.2021.3064421. Epub 2021 Mar 8.

本文引用的文献

1
A New Framework for CNN-Based Speech Enhancement in the Time Domain.基于卷积神经网络的时域语音增强新框架。
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jul;27(7):1179-1188. doi: 10.1109/taslp.2019.2913512. Epub 2019 Apr 29.
3
Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.用于单声道语音增强的带扩张卷积的门控残差网络
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):189-198. doi: 10.1109/TASLP.2018.2876171. Epub 2018 Oct 15.
4
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
5
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.用于噪声混响语音增强的两阶段深度学习
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.
6
DEEP CLUSTERING AND CONVENTIONAL NETWORKS FOR MUSIC SEPARATION: STRONGER TOGETHER.用于音乐分离的深度聚类与传统网络:携手共进,力量更强。
Proc IEEE Int Conf Acoust Speech Signal Process. 2017 Mar;2017:61-65. doi: 10.1109/ICASSP.2017.7952118. Epub 2017 Jun 19.
9
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
10
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验