Suppr超能文献

μ 律 SGAN 用于语音增强中生成具有更多细节的频谱。

μ-law SGAN for generating spectra with more details in speech enhancement.

机构信息

School of Information Science and Technology, Beijing Forestry University, 35 Qing-Hua East Road, Beijing 100083, China; Engineering Research Center for Forestry-oriented Intelligent Information Processing of National Forestry and Grassland Administration, Beijing 100083, China.

School of Information Science, Beijing Language and Culture University, Beijing 100083, China.

出版信息

Neural Netw. 2021 Apr;136:17-27. doi: 10.1016/j.neunet.2020.12.017. Epub 2020 Dec 25.

Abstract

The goal of monaural speech enhancement is to separate clean speech from noisy speech. Recently, many studies have employed generative adversarial networks (GAN) to deal with monaural speech enhancement tasks. When using generative adversarial networks for this task, the output of the generator is a speech waveform or a spectrum, such as a magnitude spectrum, a mel-spectrum or a complex-valued spectrum. The spectra generated by current speech enhancement methods in the time-frequency domain usually lack details, such as consonants and harmonics with low energy. In this paper, we propose a new type of adversarial training framework for spectrum generation, named μ-law spectrum generative adversarial networks (μ-law SGAN). We introduce a trainable μ-law spectrum compression layer (USCL) into the proposed discriminator to compress the dynamic range of the spectrum. As a result, the compressed spectrum can display more detailed information. In addition, we use the spectrum transformed by USCL to regularize the generator's training, so that the generator can pay more attention to the details of the spectrum. Experimental results on the open dataset Voice Bank + DEMAND show that μ-law SGAN is an effective generative adversarial architecture for speech enhancement. Moreover, visual spectrogram analysis suggests that μ-law SGAN pays more attention to the enhancement of low energy harmonics and consonants.

摘要

单声道语音增强的目标是从噪声语音中分离出干净的语音。最近,许多研究都采用生成对抗网络(GAN)来处理单声道语音增强任务。在使用生成对抗网络进行此任务时,生成器的输出是语音波形或频谱,例如幅度谱、梅尔谱或复值频谱。目前在时频域中用于语音增强的方法生成的频谱通常缺乏细节,例如能量较低的辅音和谐波。在本文中,我们提出了一种用于频谱生成的新型对抗训练框架,称为 μ 律频谱生成对抗网络(μ-law SGAN)。我们在提出的鉴别器中引入了一个可训练的 μ 律频谱压缩层(USCL),以压缩频谱的动态范围。因此,压缩后的频谱可以显示更详细的信息。此外,我们使用 USCL 转换后的频谱来正则化生成器的训练,以便生成器可以更加关注频谱的细节。在 Voice Bank + DEMAND 公开数据集上的实验结果表明,μ-law SGAN 是一种有效的语音增强生成对抗架构。此外,可视频谱图分析表明,μ-law SGAN 更加关注低能量谐波和辅音的增强。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验