基于生成对抗网络的语音增强的混合T域和TF域幅度与相位表示

Mixed T-domain and TF-domain Magnitude and Phase representations for GAN-based speech enhancement.

作者信息

Lin Xin, Zhang Yang, Wang Shiyuan

机构信息

College of Electronic and Information Engineering, Southwest University, Chongqing, 400715, China.

出版信息

Sci Rep. 2024 Jul 31;14(1):17698. doi: 10.1038/s41598-024-68708-w.

DOI:10.1038/s41598-024-68708-w

PMID:39085424

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11291899/

Abstract

Deep learning has made significant advancements in speech enhancement, which plays a crucial role in improving the quality of speech signals in noisy conditions. In this paper, we propose a new approach called M-DGAN, which introduces a time (T)-domain encoder-decoder structure with rich channel representations into the time-frequency (TF)-domain generator framework, resulting in a new generator structure with mixed magnitude and phase representations in the T and TF-domains. The proposed mixed T-domain and TF-domain generator, incorporating the cascaded reworked conformer (CRC) structure, exhibits improved modeling capability and adaptability. Test results on the Voice Bank + DEMAND public dataset show that our method achieves the highest score with and performs well on all the remaining metrics when compared to the current state-of-the-art methods. In addition, tests on the NISQA_TEST_LIVETALK real dataset of the NISQA Corpus show the breadth and robustness of our model on speech enhancement tasks.

摘要

深度学习在语音增强方面取得了重大进展，语音增强在改善噪声环境下语音信号的质量方面起着至关重要的作用。在本文中，我们提出了一种名为M-DGAN的新方法，该方法将具有丰富通道表示的时域（T）编码器-解码器结构引入到时频（TF）域生成器框架中，从而产生了一种在T域和TF域中具有混合幅度和相位表示的新生成器结构。所提出的混合T域和TF域生成器结合了级联重制的适形器（CRC）结构，具有更高的建模能力和适应性。在Voice Bank + DEMAND公共数据集上的测试结果表明，与当前的最先进方法相比，我们的方法以获得了最高分，并且在所有其余指标上表现良好。此外，在NISQA语料库的NISQA_TEST_LIVETALK真实数据集上的测试表明了我们的模型在语音增强任务上的广度和鲁棒性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5c5f/11291899/bcd250010ab8/41598_2024_68708_Fig1_HTML.jpg

相似文献

Mixed T-domain and TF-domain Magnitude and Phase representations for GAN-based speech enhancement.基于生成对抗网络的语音增强的混合T域和TF域幅度与相位表示

Sci Rep. 2024 Jul 31;14(1):17698. doi: 10.1038/s41598-024-68708-w.

Noise-robust voice conversion with domain adversarial training.基于域对抗训练的抗噪语音转换。

Neural Netw. 2022 Apr;148:74-84. doi: 10.1016/j.neunet.2022.01.003. Epub 2022 Jan 13.

μ-law SGAN for generating spectra with more details in speech enhancement.μ 律 SGAN 用于语音增强中生成具有更多细节的频谱。

Neural Netw. 2021 Apr;136:17-27. doi: 10.1016/j.neunet.2020.12.017. Epub 2020 Dec 25.

E-DGAN: An Encoder-Decoder Generative Adversarial Network Based Method for Pathological to Normal Voice Conversion.E-DGAN：一种基于编解码器生成对抗网络的病理语音到正常语音转换方法。

IEEE J Biomed Health Inform. 2023 May;27(5):2489-2500. doi: 10.1109/JBHI.2023.3239551. Epub 2023 May 4.

Dense CNN with Self-Attention for Time-Domain Speech Enhancement.用于时域语音增强的带自注意力机制的密集卷积神经网络

IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1270-1279. doi: 10.1109/taslp.2021.3064421. Epub 2021 Mar 8.

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.卷积时域音频分离网络（Conv-TasNet）：超越理想时频幅度掩蔽的语音分离方法

IEEE/ACM Trans Audio Speech Lang Process. 2019 Aug;27(8):1256-1266. doi: 10.1109/TASLP.2019.2915167. Epub 2019 May 6.

A method for enhancing speech and warning signals based on parallel convolutional neural networks in a noisy environment.基于噪声环境下并行卷积神经网络的语音增强和预警信号方法。

Technol Health Care. 2021;29(S1):141-152. doi: 10.3233/THC-218015.

A dual-region speech enhancement method based on voiceprint segmentation.基于声纹分割的双区域语音增强方法。

Neural Netw. 2024 Dec;180:106683. doi: 10.1016/j.neunet.2024.106683. Epub 2024 Aug 31.

Causal speech enhancement using dynamical-weighted loss and attention encoder-decoder recurrent neural network.基于动态权重损失和注意力编解码器循环神经网络的因果语音增强。

PLoS One. 2023 May 11;18(5):e0285629. doi: 10.1371/journal.pone.0285629. eCollection 2023.

A New Framework for CNN-Based Speech Enhancement in the Time Domain.基于卷积神经网络的时域语音增强新框架。

IEEE/ACM Trans Audio Speech Lang Process. 2019 Jul;27(7):1179-1188. doi: 10.1109/taslp.2019.2913512. Epub 2019 Apr 29.

本文引用的文献

CGA-MGAN: Metric GAN Based on Convolution-Augmented Gated Attention for Speech Enhancement.CGA-MGAN：基于卷积增强门控注意力的度量生成对抗网络用于语音增强

Entropy (Basel). 2023 Apr 6;25(4):628. doi: 10.3390/e25040628.

Restoring speech intelligibility for hearing aid users with deep learning.基于深度学习的助听用户语音可懂度恢复。

Sci Rep. 2023 Feb 15;13(1):2719. doi: 10.1038/s41598-023-29871-8.

A convolutional recurrent neural network with attention framework for speech separation in monaural recordings.一种带有注意力框架的卷积循环神经网络，用于单声道录音中的语音分离。

Sci Rep. 2021 Jan 14;11(1):1434. doi: 10.1038/s41598-020-80713-3.

A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures.递归神经网络综述：长短期记忆细胞和网络架构。

Neural Comput. 2019 Jul;31(7):1235-1270. doi: 10.1162/neco_a_01199. Epub 2019 May 21.

Subjective comparison and evaluation of speech enhancement algorithms.语音增强算法的主观比较与评估

Speech Commun. 2007 Jul;49(7):588-601. doi: 10.1016/j.specom.2006.12.006.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于生成对抗网络的语音增强的混合T域和TF域幅度与相位表示

Mixed T-domain and TF-domain Magnitude and Phase representations for GAN-based speech enhancement.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献