Suppr超能文献

基于神经级联架构的跨域语音增强

CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE.

作者信息

Wang Heming, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, USA.

Center for Cognitive and Brain Sciences, The Ohio State University, USA.

出版信息

Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7862-7866. doi: 10.1109/icassp43922.2022.9747752. Epub 2022 Apr 27.

Abstract

This paper proposes a novel cascade architecture to address the monaural speech enhancement problem. We leverage three different domains of speech representation, namely spectral magnitude, waveform, and complex spectrogram, to progressively suppress the background noise within noisy speech. Our proposed neural cascade architecture consists of three modules, and each operates on the original noisy input and the output of the previous module in a distinct speech representation. During training, the network simultaneously optimizes all modules with a triple-domain loss. Experiments on the WSJ0 SI-84 corpus demonstrate that our proposed approach achieves superior enhancement results, and substantially outperforms previous baselines in terms of both speech quality and intelligibility.

摘要

本文提出了一种新颖的级联架构来解决单声道语音增强问题。我们利用语音表示的三个不同域,即频谱幅度、波形和复谱图,逐步抑制噪声语音中的背景噪声。我们提出的神经级联架构由三个模块组成,每个模块在不同的语音表示中对原始噪声输入和前一个模块的输出进行操作。在训练过程中,网络使用三域损失同时优化所有模块。在WSJ0 SI - 84语料库上的实验表明,我们提出的方法取得了优异的增强效果,并且在语音质量和可懂度方面均显著优于先前的基线方法。

相似文献

1
CROSS-DOMAIN SPEECH ENHANCEMENT WITH A NEURAL CASCADE ARCHITECTURE.基于神经级联架构的跨域语音增强
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:7862-7866. doi: 10.1109/icassp43922.2022.9747752. Epub 2022 Apr 27.
2
Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.用于语音增强的具有三域损失的神经级联架构
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:734-743. doi: 10.1109/taslp.2021.3138716. Epub 2021 Dec 28.
3
Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.基于级联架构的噪声语音估计与浊音检测
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:3760-3770. doi: 10.1109/TASLP.2023.3313427. Epub 2023 Sep 13.
4
NEURAL CASCADE ARCHITECTURE FOR JOINT ACOUSTIC ECHO AND NOISE SUPPRESSION.用于联合声学回声和噪声抑制的神经级联架构
Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:671-675. doi: 10.1109/icassp43922.2022.9747445. Epub 2022 Apr 27.
7
A New Framework for CNN-Based Speech Enhancement in the Time Domain.基于卷积神经网络的时域语音增强新框架。
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jul;27(7):1179-1188. doi: 10.1109/taslp.2019.2913512. Epub 2019 Apr 29.
9
CompNet: Complementary network for single-channel speech enhancement.CompNet:用于单通道语音增强的互补网络。
Neural Netw. 2023 Nov;168:508-517. doi: 10.1016/j.neunet.2023.09.041. Epub 2023 Sep 25.

本文引用的文献

1
A New Framework for CNN-Based Speech Enhancement in the Time Domain.基于卷积神经网络的时域语音增强新框架。
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jul;27(7):1179-1188. doi: 10.1109/taslp.2019.2913512. Epub 2019 Apr 29.
2
Dense CNN with Self-Attention for Time-Domain Speech Enhancement.用于时域语音增强的带自注意力机制的密集卷积神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1270-1279. doi: 10.1109/taslp.2021.3064421. Epub 2021 Mar 8.
4
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
5
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
6
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验