用于语音增强的残差循环神经网络。

RESIDUAL RECURRENT NEURAL NETWORK FOR SPEECH ENHANCEMENT.

作者信息

Abdulbaqi Jalal, Gu Yue, Chen Shuhong, Marsic Ivan

机构信息

Rutgers, the State University of New Jersey, USA.

出版信息

Proc IEEE Int Conf Acoust Speech Signal Process. 2020 May;2020:6659-6663. doi: 10.1109/icassp40776.2020.9053544. Epub 2020 May 14.

DOI:10.1109/icassp40776.2020.9053544

PMID:33716575

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7954533/

Abstract

Most current speech enhancement models use spectrogram features that require an expensive transformation and result in phase information loss. Previous work has overcome these issues by using convolutional networks to learn the temporal correlations across high-resolution waveforms. These models, however, are limited by memory-intensive dilated convolution and aliasing artifacts from upsampling. We introduce an end-to-end fully recurrent neural network for single-channel speech enhancement. The network structured as an hourglass-shape that can efficiently capture long-range temporal dependencies by reducing the features resolution without information loss. Also, we use residual connections to prevent gradient decay over layers and improve the model generalization. Experimental results show that our model outperforms state-of-the-art approaches in six quantitative evaluation metrics.

摘要

当前大多数语音增强模型使用频谱图特征，这需要昂贵的变换并且会导致相位信息丢失。先前的工作通过使用卷积网络来学习高分辨率波形之间的时间相关性，克服了这些问题。然而，这些模型受到内存密集型扩张卷积和上采样产生的混叠伪影的限制。我们引入了一种用于单通道语音增强的端到端全循环神经网络。该网络结构呈沙漏形，通过降低特征分辨率而不损失信息，能够有效地捕捉长期时间依赖性。此外，我们使用残差连接来防止梯度在各层之间衰减，并提高模型的泛化能力。实验结果表明，我们的模型在六个定量评估指标上优于现有最先进的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07a1/7954533/7084e5428acc/nihms-1677344-f0001.jpg

相似文献

RESIDUAL RECURRENT NEURAL NETWORK FOR SPEECH ENHANCEMENT.用于语音增强的残差循环神经网络。

Proc IEEE Int Conf Acoust Speech Signal Process. 2020 May;2020:6659-6663. doi: 10.1109/icassp40776.2020.9053544. Epub 2020 May 14.

A convolutional recurrent neural network with attention framework for speech separation in monaural recordings.一种带有注意力框架的卷积循环神经网络，用于单声道录音中的语音分离。

Sci Rep. 2021 Jan 14;11(1):1434. doi: 10.1038/s41598-020-80713-3.

Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.基于 CTC 的离散语音情感识别中，将二维并行卷积神经网络与自注意力空洞残差网络相结合。

Neural Netw. 2021 Sep;141:52-60. doi: 10.1016/j.neunet.2021.03.013. Epub 2021 Mar 23.

End-to-end speech emotion recognition using a novel context-stacking dilated convolution neural network.使用新型上下文堆叠扩张卷积神经网络的端到端语音情感识别

EURASIP J Audio Speech Music Process. 2021;2021(1):18. doi: 10.1186/s13636-021-00208-5. Epub 2021 May 12.

Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.用于单声道语音增强的带扩张卷积的门控残差网络

IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):189-198. doi: 10.1109/TASLP.2018.2876171. Epub 2018 Oct 15.

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement.端到端深度卷积递归模型在抗噪波形语音增强中的应用。

Sensors (Basel). 2022 Oct 13;22(20):7782. doi: 10.3390/s22207782.

Convolutional fusion network for monaural speech enhancement.卷积融合网络用于单声道语音增强。

Neural Netw. 2021 Nov;143:97-107. doi: 10.1016/j.neunet.2021.05.017. Epub 2021 May 25.

MSTCN: A multiscale temporal convolutional network for user independent human activity recognition.MSTCN：用于用户无关的人体活动识别的多尺度时间卷积网络。

F1000Res. 2021 Dec 8;10:1261. doi: 10.12688/f1000research.73175.2. eCollection 2021.

Speech emotion recognition based on improved masking EMD and convolutional recurrent neural network.基于改进的掩码经验模态分解和卷积递归神经网络的语音情感识别

Front Psychol. 2023 Jan 9;13:1075624. doi: 10.3389/fpsyg.2022.1075624. eCollection 2022.

Mutual Correlation Attentive Factors in Dyadic Fusion Networks for Speech Emotion Recognition.用于语音情感识别的二元融合网络中的互相关注意因素

Proc ACM Int Conf Multimed. 2019 Oct;2019:157-166. doi: 10.1145/3343031.3351039.

引用本文的文献

Deeply supervised two stage generative adversarial network for stain normalization.用于染色归一化的深度监督两阶段生成对抗网络。

Sci Rep. 2025 Feb 27;15(1):7068. doi: 10.1038/s41598-025-91587-8.

Scalable DNA recognition circuits based on DNA strand displacement.基于DNA链置换的可扩展DNA识别电路。

Nanoscale Adv. 2024 Jul 19;6(19):4852-4857. doi: 10.1039/d4na00379a. eCollection 2024 Sep 24.

Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network.利用高效长短时记忆递归神经网络进行深度因果语音增强和识别。

PLoS One. 2024 Jan 3;19(1):e0291240. doi: 10.1371/journal.pone.0291240. eCollection 2024.

Magnetic DNA random access memory with nanopore readouts and exponentially-scaled combinatorial addressing.基于纳米孔读取和指数级组合寻址的磁性 DNA 随机存取存储器。

Sci Rep. 2023 May 25;13(1):8514. doi: 10.1038/s41598-023-29575-z.

End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement.端到端深度卷积递归模型在抗噪波形语音增强中的应用。

Sensors (Basel). 2022 Oct 13;22(20):7782. doi: 10.3390/s22207782.

A Waveform Mapping-Based Approach for Enhancement of Trunk Borers' Vibration Signals Using Deep Learning Model.一种基于波形映射的方法，利用深度学习模型增强蛀干害虫的振动信号

Insects. 2022 Jun 29;13(7):596. doi: 10.3390/insects13070596.

本文引用的文献

An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech.基于时频加权噪声语音可懂度预测的客观测量评估。

J Acoust Soc Am. 2011 Nov;130(5):3013-27. doi: 10.1121/1.3641373.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验