Suppr超能文献

注意力训练:一种用于语音增强的新训练框架。

Attentive Training: A New Training Framework for Speech Enhancement.

作者信息

Pandey Ashutosh, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210 USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2023;31:1360-1370. doi: 10.1109/taslp.2023.3260711. Epub 2023 Mar 23.

Abstract

Dealing with speech interference in a speech enhancement system requires either speaker separation or target speaker extraction. Speaker separation has multiple output streams with arbitrary assignments while target speaker extraction requires additional cueing for speaker selection. Both of these are not suitable for a standalone speech enhancement system with one output stream. In this study, we propose a novel training framework, called , to extend speech enhancement to deal with speech interruptions. Attentive training is based on the observation that, in the real world, multiple talkers very unlikely start speaking at the same time, and therefore, a deep neural network can be trained to create a representation of the first speaker and utilize it to attend to or track that speaker in a multitalker noisy mixture. We present experimental results and comparisons to demonstrate the effectiveness of attentive training for speech enhancement.

摘要

在语音增强系统中处理语音干扰需要进行说话人分离或目标说话人提取。说话人分离有多个输出流且分配任意,而目标说话人提取需要额外的提示来进行说话人选择。这两种方法都不适用于具有单个输出流的独立语音增强系统。在本研究中,我们提出了一种名为注意力训练的新型训练框架,以扩展语音增强来处理语音中断。注意力训练基于这样的观察:在现实世界中,多个说话者不太可能同时开始说话,因此,可以训练深度神经网络来创建第一个说话者的表示,并利用它在多说话者噪声混合中关注或跟踪该说话者。我们展示了实验结果和比较,以证明注意力训练对语音增强的有效性。

相似文献

1
Attentive Training: A New Training Framework for Speech Enhancement.注意力训练:一种用于语音增强的新训练框架。
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:1360-1370. doi: 10.1109/taslp.2023.3260711. Epub 2023 Mar 23.
3
ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.基于波分裂网络的移动扬声器在线双耳语音分离
Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10095695. Epub 2023 May 5.
5
Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.基于深度学习的说话人相关混响语音分离实证研究
IEEE/ACM Trans Audio Speech Lang Process. 2019 Nov;27(11):1839-1848. doi: 10.1109/taslp.2019.2934319. Epub 2019 Aug 12.
6
Noise-robust voice conversion with domain adversarial training.基于域对抗训练的抗噪语音转换。
Neural Netw. 2022 Apr;148:74-84. doi: 10.1016/j.neunet.2022.01.003. Epub 2022 Jan 13.
9
Causal Deep CASA for Monaural Talker-Independent Speaker Separation.用于单声道独立说话人分离的因果深度CASA
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2109-2118. doi: 10.1109/taslp.2020.3007779. Epub 2020 Jul 8.
10
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.

本文引用的文献

1
Self-attending RNN for Speech Enhancement to Improve Cross-corpus Generalization.用于语音增强以提高跨语料库泛化能力的自关注循环神经网络
IEEE/ACM Trans Audio Speech Lang Process. 2022;30:1374-1385. doi: 10.1109/taslp.2022.3161143. Epub 2022 Mar 22.
2
A New Framework for CNN-Based Speech Enhancement in the Time Domain.基于卷积神经网络的时域语音增强新框架。
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jul;27(7):1179-1188. doi: 10.1109/taslp.2019.2913512. Epub 2019 Apr 29.
3
ArcFace: Additive Angular Margin Loss for Deep Face Recognition.ArcFace:用于深度人脸识别的附加角度间隔损失。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):5962-5979. doi: 10.1109/TPAMI.2021.3087709. Epub 2022 Sep 14.
7
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
10
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验