Suppr超能文献

使用门控卷积递归网络学习复杂频谱映射以实现单声道语音增强

Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.

作者信息

Tan Ke, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, 43210-1277 USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210-1277, USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2020;28:380-390. doi: 10.1109/taslp.2019.2955276. Epub 2019 Nov 22.

Abstract

Phase is important for perceptual quality of speech. However, it seems intractable to directly estimate phase spectra through supervised learning due to their lack of spectrotemporal structure in it. Complex spectral mapping aims to estimate the real and imaginary spectrograms of clean speech from those of noisy speech, which simultaneously enhances magnitude and phase responses of speech. Inspired by multi-task learning, we propose a gated convolutional recurrent network (GCRN) for complex spectral mapping, which amounts to a causal system for monaural speech enhancement. Our experimental results suggest that the proposed GCRN substantially outperforms an existing convolutional neural network (CNN) for complex spectral mapping in terms of both objective speech intelligibility and quality. Moreover, the proposed approach yields significantly higher STOI and PESQ than magnitude spectral mapping and complex ratio masking. We also find that complex spectral mapping with the proposed GCRN provides an effective phase estimate.

摘要

相位对于语音的感知质量很重要。然而,由于相位谱缺乏频谱时间结构,通过监督学习直接估计相位谱似乎难以解决。复谱映射旨在从带噪语音的实部和虚部谱图估计纯净语音的实部和虚部谱图,这同时增强了语音的幅度和相位响应。受多任务学习的启发,我们提出了一种用于复谱映射的门控卷积循环网络(GCRN),它相当于一个用于单声道语音增强的因果系统。我们的实验结果表明,所提出的GCRN在客观语音清晰度和质量方面都大大优于现有的用于复谱映射的卷积神经网络(CNN)。此外,所提出的方法产生的短时客观可懂度(STOI)和语音质量感知评估(PESQ)显著高于幅度谱映射和复比率掩蔽。我们还发现,使用所提出的GCRN进行复谱映射可提供有效的相位估计。

相似文献

2
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
3
Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.用于单声道语音增强的带扩张卷积的门控残差网络
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):189-198. doi: 10.1109/TASLP.2018.2876171. Epub 2018 Oct 15.
7
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.复域中的时频掩蔽用于语音去混响和降噪
IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.
8
Deep Learning Based Real-time Speech Enhancement for Dual-microphone Mobile Phones.基于深度学习的双麦克风手机实时语音增强
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1853-1863. doi: 10.1109/taslp.2021.3082318. Epub 2021 May 21.
9
A New Framework for CNN-Based Speech Enhancement in the Time Domain.基于卷积神经网络的时域语音增强新框架。
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jul;27(7):1179-1188. doi: 10.1109/taslp.2019.2913512. Epub 2019 Apr 29.

引用本文的文献

3
CROSS-DOMAIN DIFFUSION BASED SPEECH ENHANCEMENT FOR VERY NOISY SPEECH.基于跨域扩散的极嘈杂语音增强
Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10096985. Epub 2023 May 5.
4
CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE.基于神经级联架构的跨域语音增强
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7862-7866. doi: 10.1109/icassp43922.2022.9747752. Epub 2022 Apr 27.
5
Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.基于级联架构的噪声语音估计与浊音检测
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:3760-3770. doi: 10.1109/TASLP.2023.3313427. Epub 2023 Sep 13.
7
NEURAL CASCADE ARCHITECTURE FOR JOINT ACOUSTIC ECHO AND NOISE SUPPRESSION.用于联合声学回声和噪声抑制的神经级联架构
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:671-675. doi: 10.1109/icassp43922.2022.9747445. Epub 2022 Apr 27.
8
ATTENTION-BASED FUSION FOR BONE-CONDUCTED AND AIR-CONDUCTED SPEECH ENHANCEMENT IN THE COMPLEX DOMAIN.复杂域中基于注意力的骨传导和声传导语音增强融合
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7757-7761. doi: 10.1109/icassp43922.2022.9746374. Epub 2022 Apr 27.

本文引用的文献

1
Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.用于单声道语音增强的带扩张卷积的门控残差网络
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):189-198. doi: 10.1109/TASLP.2018.2876171. Epub 2018 Oct 15.
3
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
4
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.
5
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验