三零：基于端到端冻结无声语音分离网络的零样本去噪和去混响。

Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network.

机构信息

Department of Electrical Engineering, University of Engineering and Technology, Peshawar, Pakistan.

Intelligent Information Processing Lab, National Center of Artificial Intelligence, University of Engineering and Technology, Peshawar, Pakistan.

出版信息

PLoS One. 2024 Jul 16;19(7):e0301692. doi: 10.1371/journal.pone.0301692. eCollection 2024.

DOI:10.1371/journal.pone.0301692

PMID:39012881

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11251582/

Abstract

Speech enhancement is crucial both for human and machine listening applications. Over the last decade, the use of deep learning for speech enhancement has resulted in tremendous improvement over the classical signal processing and machine learning methods. However, training a deep neural network is not only time-consuming; it also requires extensive computational resources and a large training dataset. Transfer learning, i.e. using a pretrained network for a new task, comes to the rescue by reducing the amount of training time, computational resources, and the required dataset, but the network still needs to be fine-tuned for the new task. This paper presents a novel method of speech denoising and dereverberation (SD&D) on an end-to-end frozen binaural anechoic speech separation network. The frozen network requires neither any architectural change nor any fine-tuning for the new task, as is usually required for transfer learning. The interaural cues of a source placed inside noisy and echoic surroundings are given as input to this pretrained network to extract the target speech from noise and reverberation. Although the pretrained model used in this paper has never seen noisy reverberant conditions during its training, it performs satisfactorily for zero-shot testing (ZST) under these conditions. It is because the pretrained model used here has been trained on the direct-path interaural cues of an active source and so it can recognize them even in the presence of echoes and noise. ZST on the same dataset on which the pretrained network was trained (homo-corpus) for the unseen class of interference, has shown considerable improvement over the weighted prediction error (WPE) algorithm in terms of four objective speech quality and intelligibility metrics. Also, the proposed model offers similar performance provided by a deep learning SD&D algorithm for this dataset under varying conditions of noise and reverberations. Similarly, ZST on a different dataset has provided an improvement in intelligibility and almost equivalent quality as provided by the WPE algorithm.

摘要

语音增强对于人类和机器听觉应用都至关重要。在过去的十年中，深度学习在语音增强方面的应用取得了巨大的进步，超越了传统的信号处理和机器学习方法。然而，训练一个深度神经网络不仅耗时，还需要大量的计算资源和大型训练数据集。迁移学习，即使用预训练的网络进行新任务，通过减少训练时间、计算资源和所需数据集的数量来提供帮助，但网络仍然需要针对新任务进行微调。本文提出了一种基于端到端冻结双耳无回声语音分离网络的语音去噪和去混响（SD&D）新方法。冻结网络既不需要任何架构更改，也不需要针对新任务进行微调，这是迁移学习通常需要的。将源的耳间线索置于嘈杂和混响环境中作为输入提供给这个预训练网络，以从噪声和混响中提取目标语音。尽管本文中使用的预训练模型在训练过程中从未见过嘈杂混响条件，但它在这些条件下的零镜头测试（ZST）中表现令人满意。这是因为这里使用的预训练模型是基于有源源的直达路径耳间线索进行训练的，因此即使在存在回声和噪声的情况下，它也可以识别这些线索。在预训练网络所训练的同一数据集上进行的零镜头测试（同语料库）对于看不见的干扰类别，在四个客观语音质量和可懂度指标方面，都比加权预测误差（WPE）算法有了相当大的改进。此外，对于该数据集，在不同的噪声和混响条件下，所提出的模型提供了与深度学习 SD&D 算法相似的性能。同样，在不同的数据集上进行的零镜头测试也提高了可懂度，并提供了与 WPE 算法相当的质量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c035/11251582/491ceb855f68/pone.0301692.g001.jpg

相似文献

Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network.三零：基于端到端冻结无声语音分离网络的零样本去噪和去混响。

PLoS One. 2024 Jul 16;19(7):e0301692. doi: 10.1371/journal.pone.0301692. eCollection 2024.

A deep learning based segregation algorithm to increase speech intelligibility for hearing-impaired listeners in reverberant-noisy conditions.基于深度学习的分割算法，可提高在混响噪声环境下听力障碍者的语音可懂度。

J Acoust Soc Am. 2018 Sep;144(3):1627. doi: 10.1121/1.5055562.

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.复域中的时频掩蔽用于语音去混响和降噪

IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.

Deep learning restores speech intelligibility in multi-talker interference for cochlear implant users.深度学习恢复多说话人干扰下人工耳蜗使用者的言语可懂度。

Sci Rep. 2024 Jun 9;14(1):13241. doi: 10.1038/s41598-024-63675-8.

Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users.基于神经网络的语音增强技术可提高人工耳蜗使用者在噪声环境中的语音清晰度。

Hear Res. 2017 Feb;344:183-194. doi: 10.1016/j.heares.2016.11.012. Epub 2016 Nov 30.

Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.用于噪声混响语音增强的两阶段深度学习

IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.

Acoustic and perceptual effects of magnifying interaural difference cues in a simulated "binaural" hearing aid.模拟“双耳”助听器中放大双耳差异线索的声学和感知效果。

Int J Audiol. 2018 Jun;57(sup3):S81-S91. doi: 10.1080/14992027.2017.1308564. Epub 2017 Apr 10.

A Fused Deep Denoising Sound Coding Strategy for Bilateral Cochlear Implants.融合深度去噪的双耳人工耳蜗声音编码策略。

IEEE Trans Biomed Eng. 2024 Jul;71(7):2232-2242. doi: 10.1109/TBME.2024.3367530. Epub 2024 Jun 19.

Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除

IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.

Domain-adaptive denoising network for low-dose CT via noise estimation and transfer learning.基于噪声估计和迁移学习的适用于低剂量 CT 的域自适应去噪网络。

Med Phys. 2023 Jan;50(1):74-88. doi: 10.1002/mp.15952. Epub 2022 Sep 2.

本文引用的文献

Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除

IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.

Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.复域中的时频掩蔽用于语音去混响和降噪

IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.

Attribute-based classification for zero-shot visual object categorization.基于属性的零样本视觉目标分类。

IEEE Trans Pattern Anal Mach Intell. 2014 Mar;36(3):453-65. doi: 10.1109/TPAMI.2013.140.

Effects of room acoustics on the intelligibility of speech in classrooms for young children.室内声学对幼儿教室中语音清晰度的影响。

J Acoust Soc Am. 2009 Feb;125(2):922-33. doi: 10.1121/1.3058900.

A detailed study on the effects of noise on speech intelligibility.

J Acoust Soc Am. 2007 Nov;122(5):2865-71. doi: 10.1121/1.2783131.

Localizing nearby sound sources in a classroom: binaural room impulse responses.在教室中定位附近声源：双耳房间脉冲响应

J Acoust Soc Am. 2005 May;117(5):3100-15. doi: 10.1121/1.1872572.

On the combined effects of signal-to-noise ratio and room acoustics on speech intelligibility.关于信噪比和室内声学对语音清晰度的综合影响。

J Acoust Soc Am. 1999 Oct;106(4 Pt 1):1820-8. doi: 10.1121/1.427932.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

三零：基于端到端冻结无声语音分离网络的零样本去噪和去混响。

Triple-0: Zero-shot denoising and dereverberation on an end-to-end frozen anechoic speech separation network.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献