用于联合声学回声和噪声抑制的神经级联架构

NEURAL CASCADE ARCHITECTURE FOR JOINT ACOUSTIC ECHO AND NOISE SUPPRESSION.

作者信息

Zhang Hao, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, USA.

Center for Cognitive and Brain Sciences, The Ohio State University, USA.

出版信息

Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:671-675. doi: 10.1109/icassp43922.2022.9747445. Epub 2022 Apr 27.

DOI:10.1109/icassp43922.2022.9747445

PMID:40313329

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12045126/

Abstract

In this paper, we propose a neural cascade architecture for joint acoustic echo and noise suppression. The proposed cascade architecture consists of two modules. A convolutional recurrent network (CRN) is employed in the first module for complex spectral mapping. The output is then fed as an additional input to the second module, where a long short-term memory network (LSTM) is utilized for magnitude mask estimation. The entire architecture is trained in an end-to-end manner with the two modules optimized jointly using a single loss function. The final output is generated using the enhanced phase and magnitude obtained from the first and the second module, respectively. The cascade architecture enables the proposed method to obtain robust magnitude estimation as well as phase enhancement. Evaluation results show that the proposed method effectively suppresses acoustic echo and noise while preserving good speech quality, and significantly outperforms related methods.

摘要

在本文中，我们提出了一种用于联合声学回声和噪声抑制的神经级联架构。所提出的级联架构由两个模块组成。第一个模块采用卷积循环网络（CRN）进行复谱映射。然后，输出作为额外输入被馈送到第二个模块，在该模块中使用长短期记忆网络（LSTM）进行幅度掩码估计。整个架构以端到端的方式进行训练，两个模块使用单个损失函数进行联合优化。最终输出分别使用从第一个和第二个模块获得的增强相位和幅度生成。级联架构使所提出的方法能够获得稳健的幅度估计以及相位增强。评估结果表明，所提出的方法在保持良好语音质量的同时有效地抑制了声学回声和噪声，并且显著优于相关方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc4a/12045126/765c96a341f6/nihms-2076302-f0001.jpg

相似文献

NEURAL CASCADE ARCHITECTURE FOR JOINT ACOUSTIC ECHO AND NOISE SUPPRESSION.用于联合声学回声和噪声抑制的神经级联架构

Proc IEEE Int Conf Acoust Speech Signal Process. 2022 May;2022:671-675. doi: 10.1109/icassp43922.2022.9747445. Epub 2022 Apr 27.

Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.用于语音增强的具有三域损失的神经级联架构

IEEE/ACM Trans Audio Speech Lang Process. 2022;30:734-743. doi: 10.1109/taslp.2021.3138716. Epub 2021 Dec 28.

Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech.用于噪声语音基频估计的密集连接卷积循环网络

Interspeech. 2022 Sep;2022:401-405. doi: 10.21437/interspeech.2022-11156.

Deep learning-based stereophonic acoustic echo suppression without decorrelation.基于深度学习的无去相关立体声回声抑制

J Acoust Soc Am. 2021 Aug;150(2):816. doi: 10.1121/10.0005757.

Multi-TALK: Multi-Microphone Cross-Tower Network for Jointly Suppressing Acoustic Echo and Background Noise.多-TALK：用于联合抑制声回波和背景噪声的多麦克风跨塔网络。

Sensors (Basel). 2020 Nov 13;20(22):6493. doi: 10.3390/s20226493.

Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.通过语音分离和联合自适应训练提高深度神经网络声学模型的鲁棒性

IEEE/ACM Trans Audio Speech Lang Process. 2015 Jan;23(1):92-101. doi: 10.1109/TASLP.2014.2372314. Epub 2015 Jan 14.

Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network.利用高效长短时记忆递归神经网络进行深度因果语音增强和识别。

PLoS One. 2024 Jan 3;19(1):e0291240. doi: 10.1371/journal.pone.0291240. eCollection 2024.

Incorporation of residual attention modules into two neural networks for low-dose CT denoising.将残差注意模块整合到两个神经网络中用于低剂量 CT 去噪。

Med Phys. 2021 Jun;48(6):2973-2990. doi: 10.1002/mp.14856. Epub 2021 Apr 23.

Speech Enhancement for Cochlear Implant Recipients using Deep Complex Convolution Transformer with Frequency Transformation.使用具有频率变换的深度复卷积变换器对人工耳蜗植入者进行语音增强

IEEE/ACM Trans Audio Speech Lang Process. 2024;32:2616-2629. doi: 10.1109/taslp.2024.3366760. Epub 2024 Feb 22.

Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.使用门控卷积递归网络学习复杂频谱映射以实现单声道语音增强

IEEE/ACM Trans Audio Speech Lang Process. 2020;28:380-390. doi: 10.1109/taslp.2019.2955276. Epub 2019 Nov 22.

引用本文的文献

Multichannel speech enhancement for automatic speech recognition: a literature review.用于自动语音识别的多通道语音增强：文献综述

PeerJ Comput Sci. 2025 Mar 27;11:e2772. doi: 10.7717/peerj-cs.2772. eCollection 2025.

本文引用的文献

Neural Cascade Architecture with Triple-domain Loss for Speech Enhancement.用于语音增强的具有三域损失的神经级联架构

IEEE/ACM Trans Audio Speech Lang Process. 2022;30:734-743. doi: 10.1109/taslp.2021.3138716. Epub 2021 Dec 28.

Complex Spectral Mapping for Single- and Multi-Channel Speech Enhancement and Robust ASR.用于单通道和多通道语音增强及稳健自动语音识别的复杂谱映射

IEEE/ACM Trans Audio Speech Lang Process. 2020;28:1778-1787. doi: 10.1109/taslp.2020.2998279. Epub 2020 May 28.

Learning Complex Spectral Mapping with Gated Convolutional Recurrent Networks for Monaural Speech Enhancement.使用门控卷积递归网络学习复杂频谱映射以实现单声道语音增强

IEEE/ACM Trans Audio Speech Lang Process. 2020;28:380-390. doi: 10.1109/taslp.2019.2955276. Epub 2019 Nov 22.

Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises.大规模训练以提高听力受损者在新型噪声环境下的言语可懂度。

J Acoust Soc Am. 2016 May;139(5):2604. doi: 10.1121/1.4948445.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验