Suppr超能文献

基于生成对抗网络的语音增强的混合T域和TF域幅度与相位表示

Mixed T-domain and TF-domain Magnitude and Phase representations for GAN-based speech enhancement.

作者信息

Lin Xin, Zhang Yang, Wang Shiyuan

机构信息

College of Electronic and Information Engineering, Southwest University, Chongqing, 400715, China.

出版信息

Sci Rep. 2024 Jul 31;14(1):17698. doi: 10.1038/s41598-024-68708-w.

Abstract

Deep learning has made significant advancements in speech enhancement, which plays a crucial role in improving the quality of speech signals in noisy conditions. In this paper, we propose a new approach called M-DGAN, which introduces a time (T)-domain encoder-decoder structure with rich channel representations into the time-frequency (TF)-domain generator framework, resulting in a new generator structure with mixed magnitude and phase representations in the T and TF-domains. The proposed mixed T-domain and TF-domain generator, incorporating the cascaded reworked conformer (CRC) structure, exhibits improved modeling capability and adaptability. Test results on the Voice Bank + DEMAND public dataset show that our method achieves the highest score with and performs well on all the remaining metrics when compared to the current state-of-the-art methods. In addition, tests on the NISQA_TEST_LIVETALK real dataset of the NISQA Corpus show the breadth and robustness of our model on speech enhancement tasks.

摘要

深度学习在语音增强方面取得了重大进展,语音增强在改善噪声环境下语音信号的质量方面起着至关重要的作用。在本文中,我们提出了一种名为M-DGAN的新方法,该方法将具有丰富通道表示的时域(T)编码器-解码器结构引入到时频(TF)域生成器框架中,从而产生了一种在T域和TF域中具有混合幅度和相位表示的新生成器结构。所提出的混合T域和TF域生成器结合了级联重制的适形器(CRC)结构,具有更高的建模能力和适应性。在Voice Bank + DEMAND公共数据集上的测试结果表明,与当前的最先进方法相比,我们的方法以 获得了最高分,并且在所有其余指标上表现良好。此外,在NISQA语料库的NISQA_TEST_LIVETALK真实数据集上的测试表明了我们的模型在语音增强任务上的广度和鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c5f/11291899/bcd250010ab8/41598_2024_68708_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验