Suppr超能文献

一种用于混响条件下说话人无关说话人分离的两阶段深度学习算法。

A two-stage deep learning algorithm for talker-independent speaker separation in reverberant conditions.

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA.

出版信息

J Acoust Soc Am. 2020 Sep;148(3):1157. doi: 10.1121/10.0001779.

Abstract

Speaker separation is a special case of speech separation, in which the mixture signal comprises two or more speakers. Many talker-independent speaker separation methods have been introduced in recent years to address this problem in anechoic conditions. To consider more realistic environments, this paper investigates talker-independent speaker separation in reverberant conditions. To effectively deal with speaker separation and speech dereverberation, extending the deep computational auditory scene analysis (CASA) approach to a two-stage system is proposed. In this method, reverberant utterances are first separated and separated utterances are then dereverberated. The proposed two-stage deep CASA system significantly outperforms a baseline one-stage deep CASA method in real reverberant conditions. The proposed system has superior separation performance at the frame level and higher accuracy in assigning separated frames to individual speakers. The proposed system successfully generalizes to an unseen speech corpus and exhibits similar performance to a talker-dependent system.

摘要

说话人分离是语音分离的一个特例,其中混合信号包含两个或更多说话人。近年来,已经提出了许多说话人无关的说话人分离方法来解决无声条件下的这个问题。为了考虑更现实的环境,本文研究了混响条件下的说话人无关的说话人分离。为了有效地处理说话人分离和语音去混响,本文将深度计算听觉场景分析(CASA)方法扩展到两阶段系统中。在该方法中,首先对混响语音进行分离,然后对分离的语音进行去混响。在真实混响条件下,所提出的两阶段深度 CASA 系统明显优于基线的单阶段深度 CASA 方法。该系统在帧级具有优越的分离性能,并且在将分离的帧分配给各个说话人时具有更高的准确性。该系统成功地推广到一个看不见的语音语料库,并表现出与说话人相关系统相似的性能。

相似文献

2
Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.
IEEE/ACM Trans Audio Speech Lang Process. 2019 Nov;27(11):1839-1848. doi: 10.1109/taslp.2019.2934319. Epub 2019 Aug 12.
3
Causal Deep CASA for Monaural Talker-Independent Speaker Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2109-2118. doi: 10.1109/taslp.2020.3007779. Epub 2020 Jul 8.
5
Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2019;27(12):2092-2102. doi: 10.1109/taslp.2019.2941148. Epub 2019 Sep 12.
7
A dual-stream deep attractor network with multi-domain learning for speech dereverberation and separation.
Neural Netw. 2021 Sep;141:238-248. doi: 10.1016/j.neunet.2021.04.023. Epub 2021 Apr 21.
8
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:2001-2014. doi: 10.1109/taslp.2021.3083405. Epub 2021 May 26.
9
Deep Learning Based Binaural Speech Separation in Reverberant Environments.
IEEE/ACM Trans Audio Speech Lang Process. 2017 May;25(5):1075-1084. doi: 10.1109/TASLP.2017.2687104. Epub 2017 Mar 24.

引用本文的文献

1
Causal Deep CASA for Monaural Talker-Independent Speaker Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2109-2118. doi: 10.1109/taslp.2020.3007779. Epub 2020 Jul 8.

本文引用的文献

1
Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2019;27(12):2092-2102. doi: 10.1109/taslp.2019.2941148. Epub 2019 Sep 12.
2
Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.
IEEE/ACM Trans Audio Speech Lang Process. 2019 Nov;27(11):1839-1848. doi: 10.1109/taslp.2019.2934319. Epub 2019 Aug 12.
3
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2019 Aug;27(8):1256-1266. doi: 10.1109/TASLP.2019.2915167. Epub 2019 May 6.
4
Supervised Speech Separation Based on Deep Learning: An Overview.
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
5
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.
8
A Deep Ensemble Learning Method for Monaural Speech Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(5):967-977. doi: 10.1109/TASLP.2016.2536478. Epub 2016 Mar 1.
9
Complex Ratio Masking for Monaural Speech Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(3):483-492. doi: 10.1109/TASLP.2015.2512042. Epub 2015 Dec 23.
10
On Training Targets for Supervised Speech Separation.
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验