Suppr超能文献

分而治之:一种用于独立于说话者的单声道语音分离的深度CASA方法。

Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.

作者信息

Liu Yuzhou, Wang DeLiang

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210-1277 USA.

Department of Computer Science and Engineering and the Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, OH 43210-1277 USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2019;27(12):2092-2102. doi: 10.1109/taslp.2019.2941148. Epub 2019 Sep 12.

Abstract

We address talker-independent monaural speaker separation from the perspectives of deep learning and computational auditory scene analysis (CASA). Specifically, we decompose the multi-speaker separation task into the stages of simultaneous grouping and sequential grouping. Simultaneous grouping is first performed in each time frame by separating the spectra of different speakers with a permutation-invariantly trained neural network. In the second stage, the frame-level separated spectra are sequentially grouped to different speakers by a clustering network. The proposed deep CASA approach optimizes frame-level separation and speaker tracking in turn, and produces excellent results for both objectives. Experimental results on the benchmark WSJ0-2mix database show that the new approach achieves the state-of-the-art results with a modest model size.

摘要

我们从深度学习和计算听觉场景分析(CASA)的角度来探讨与说话者无关的单声道说话者分离问题。具体来说,我们将多说话者分离任务分解为同时分组和顺序分组两个阶段。首先,在每个时间帧中通过使用经过排列不变训练的神经网络分离不同说话者的频谱来执行同时分组。在第二阶段,通过聚类网络将帧级分离频谱顺序分组到不同说话者。所提出的深度CASA方法依次优化帧级分离和说话者跟踪,并在这两个目标上都产生了出色的结果。在基准WSJ0 - 2mix数据库上的实验结果表明,新方法在模型规模适中的情况下取得了当前最优的结果。

相似文献

2
Causal Deep CASA for Monaural Talker-Independent Speaker Separation.用于单声道独立说话人分离的因果深度CASA
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2109-2118. doi: 10.1109/taslp.2020.3007779. Epub 2020 Jul 8.
4
A comparison of several computational auditory scene analysis (CASA) techniques for monaural speech segregation.
Brain Inform. 2015 Sep;2(3):155-166. doi: 10.1007/s40708-015-0016-0. Epub 2015 Aug 4.
5
Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.基于深度学习的说话人相关混响语音分离实证研究
IEEE/ACM Trans Audio Speech Lang Process. 2019 Nov;27(11):1839-1848. doi: 10.1109/taslp.2019.2934319. Epub 2019 Aug 12.
7
ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.基于波分裂网络的移动扬声器在线双耳语音分离
Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10095695. Epub 2023 May 5.
8
DEEP ATTRACTOR NETWORK FOR SINGLE-MICROPHONE SPEAKER SEPARATION.用于单麦克风扬声器分离的深度吸引子网络
Proc IEEE Int Conf Acoust Speech Signal Process. 2017 Mar;2017:246-250. doi: 10.1109/ICASSP.2017.7952155. Epub 2017 Jun 19.

引用本文的文献

1
Estimation and Voicing Detection With Cascade Architecture in Noisy Speech.基于级联架构的噪声语音估计与浊音检测
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:3760-3770. doi: 10.1109/TASLP.2023.3313427. Epub 2023 Sep 13.
2
Attentive Training: A New Training Framework for Speech Enhancement.注意力训练:一种用于语音增强的新训练框架。
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:1360-1370. doi: 10.1109/taslp.2023.3260711. Epub 2023 Mar 23.
6
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.用于逐话语和连续语音分离的多麦克风复谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:2001-2014. doi: 10.1109/taslp.2021.3083405. Epub 2021 May 26.
7
Towards Model Compression for Deep Learning Based Speech Enhancement.面向基于深度学习的语音增强的模型压缩
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1785-1794. doi: 10.1109/taslp.2021.3082282. Epub 2021 May 21.
9
Deep Learning Based Target Cancellation for Speech Dereverberation.基于深度学习的语音去混响目标消除
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:941-950. doi: 10.1109/taslp.2020.2975902. Epub 2020 Feb 28.
10
Causal Deep CASA for Monaural Talker-Independent Speaker Separation.用于单声道独立说话人分离的因果深度CASA
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2109-2118. doi: 10.1109/taslp.2020.3007779. Epub 2020 Jul 8.

本文引用的文献

1
A New Framework for CNN-Based Speech Enhancement in the Time Domain.基于卷积神经网络的时域语音增强新框架。
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jul;27(7):1179-1188. doi: 10.1109/taslp.2019.2913512. Epub 2019 Apr 29.
4
A Deep Ensemble Learning Method for Monaural Speech Separation.一种用于单声道语音分离的深度集成学习方法。
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(5):967-977. doi: 10.1109/TASLP.2016.2536478. Epub 2016 Mar 1.
5
Complex Ratio Masking for Monaural Speech Separation.用于单声道语音分离的复比掩蔽
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
6
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验