• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过同一个神经网络多次传播进行语音增强。

Speech Enhancement by Multiple Propagation through the Same Neural Network.

机构信息

Institute of Automatic Control and Robotics, Poznan University of Technology, 60-965 Poznan, Poland.

出版信息

Sensors (Basel). 2022 Mar 22;22(7):2440. doi: 10.3390/s22072440.

DOI:10.3390/s22072440
PMID:35408056
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9003084/
Abstract

Monaural speech enhancement aims to remove background noise from an audio recording containing speech in order to improve its clarity and intelligibility. Currently, the most successful solutions for speech enhancement use deep neural networks. In a typical setting, such neural networks process the noisy input signal once and produces a single enhanced signal. However, it was recently shown that a U-Net-based network can be trained in such a way that allows it to process the same input signal multiple times in order to enhance the speech even further. Unfortunately, this was tested only for two-iteration enhancement. In the current research, we extend previous efforts and demonstrate how the multi-forward-pass speech enhancement can be successfully applied to other architectures, namely the ResBLSTM and Transformer-Net. Moreover, we test the three architectures with up to five iterations, thus identifying the method's limit in terms of performance gain. In our experiments, we used the audio samples from the WSJ0, Noisex-92, and DCASE datasets and measured speech enhancement quality using SI-SDR, STOI, and PESQ. The results show that performing speech enhancement up to five times still brings improvements to speech intelligibility, but the gain becomes smaller with each iteration. Nevertheless, performing five iterations instead of two gives additional a 0.6 dB SI-SDR and four-percentage-point STOI gain. However, these increments are not equal between different architectures, and the U-Net and Transformer-Net benefit more from multi-forward pass compared to ResBLSTM.

摘要

单声道语音增强旨在从包含语音的音频记录中去除背景噪声,以提高其清晰度和可理解性。目前,用于语音增强的最成功的解决方案是使用深度神经网络。在典型的设置中,这种神经网络会对有噪声的输入信号进行一次处理,并生成一个增强后的信号。然而,最近有人表明,基于 U-Net 的网络可以经过训练,使其能够多次处理相同的输入信号,从而进一步增强语音。不幸的是,这仅在两次增强的情况下进行了测试。在当前的研究中,我们扩展了以前的工作,并展示了多前向传递语音增强如何成功应用于其他架构,即 ResBLSTM 和 Transformer-Net。此外,我们使用多达五个迭代来测试这三种架构,从而确定该方法在性能增益方面的限制。在我们的实验中,我们使用了来自 WSJ0、Noisex-92 和 DCASE 数据集的音频样本,并使用 SI-SDR、STOI 和 PESQ 来衡量语音增强质量。结果表明,进行多达五次的语音增强仍然可以提高语音的可理解性,但每次迭代的增益都会变小。然而,进行五次迭代而不是两次迭代可以额外获得 0.6dB 的 SI-SDR 和四个百分点的 STOI 增益。然而,这些增量在不同的架构之间并不相等,与 ResBLSTM 相比,U-Net 和 Transformer-Net 从多前向传递中获益更多。

相似文献

1
Speech Enhancement by Multiple Propagation through the Same Neural Network.通过同一个神经网络多次传播进行语音增强。
Sensors (Basel). 2022 Mar 22;22(7):2440. doi: 10.3390/s22072440.
2
Comparison of ideal mask-based speech enhancement algorithms for speech mixed with white noise at low mixture signal-to-noise ratios.低混合信噪比下与白噪声混合语音的理想基于掩码语音增强算法比较
J Acoust Soc Am. 2022 Dec;152(6):3458. doi: 10.1121/10.0016494.
3
An Evaluation of Output Signal to Noise Ratio as a Predictor of Cochlear Implant Speech Intelligibility.输出信噪比评估作为人工耳蜗言语可懂度预测指标的研究。
Ear Hear. 2018 Sep/Oct;39(5):958-968. doi: 10.1097/AUD.0000000000000556.
4
Towards real-world objective speech quality and intelligibility assessment using speech-enhancement residuals and convolutional long short-term memory networks.利用语音增强残差和卷积长短期记忆网络进行真实客观的语音质量和可懂度评估。
J Acoust Soc Am. 2020 Nov;148(5):3348. doi: 10.1121/10.0002702.
5
Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network.利用高效长短时记忆递归神经网络进行深度因果语音增强和识别。
PLoS One. 2024 Jan 3;19(1):e0291240. doi: 10.1371/journal.pone.0291240. eCollection 2024.
6
Improving the Intelligibility of Speech for Simulated Electric and Acoustic Stimulation Using Fully Convolutional Neural Networks.利用全卷积神经网络提高电刺激和声刺激模拟语音的可懂度。
IEEE Trans Neural Syst Rehabil Eng. 2021;29:184-195. doi: 10.1109/TNSRE.2020.3042655. Epub 2021 Feb 26.
7
End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement.端到端深度卷积递归模型在抗噪波形语音增强中的应用。
Sensors (Basel). 2022 Oct 13;22(20):7782. doi: 10.3390/s22207782.
8
Dual-path transformer-based network with equalization-generation components prediction for flexible vibrational sensor speech enhancement in the time domain.基于双通道变换的网络,具有均衡生成组件预测,用于在时域中对柔性振动传感器语音进行增强。
J Acoust Soc Am. 2022 May;151(5):2814. doi: 10.1121/10.0010316.
9
Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis.基于联合时域和时频域分析的语音预处理与增强
J Acoust Soc Am. 2024 Jun 1;155(6):3580-3588. doi: 10.1121/10.0026219.
10
Intelligibility prediction for speech mixed with white Gaussian noise at low signal-to-noise ratios.低信噪比下混入白噪声语音的可懂度预测。
J Acoust Soc Am. 2021 Feb;149(2):1346. doi: 10.1121/10.0003557.

引用本文的文献

1
Experimental Investigation of Acoustic Features to Optimize Intelligibility in Cochlear Implants.实验研究优化人工耳蜗植入中可懂度的声学特征。
Sensors (Basel). 2023 Aug 31;23(17):7553. doi: 10.3390/s23177553.
2
A Survey on Low-Latency DNN-Based Speech Enhancement.基于 DNN 的低延迟语音增强技术研究综述
Sensors (Basel). 2023 Jan 26;23(3):1380. doi: 10.3390/s23031380.

本文引用的文献

1
Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement.用于单声道语音增强的带扩张卷积的门控残差网络
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):189-198. doi: 10.1109/TASLP.2018.2876171. Epub 2018 Oct 15.
2
Deep Learning Reinvents the Hearing Aid: Finally, wearers of hearing aids can pick out a voice in a crowded room.深度学习重塑助听器:终于,助听器佩戴者能够在拥挤的房间里分辨出说话声了。
IEEE Spectr. 2017 Mar;54(3):32-37. doi: 10.1109/MSPEC.2017.7864754. Epub 2017 Feb 28.
3
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.
复域中的时频掩蔽用于语音去混响和降噪
IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.
4
Long short-term memory for speaker generalization in supervised speech separation.用于监督语音分离中说话人泛化的长短期记忆网络
J Acoust Soc Am. 2017 Jun;141(6):4705. doi: 10.1121/1.4986931.
5
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.
6
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.