Suppr超能文献

基于深度学习的说话人相关混响语音分离实证研究

Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.

作者信息

Delfarah Masood, Wang DeLiang

机构信息

Computer Science and Engineering, The Ohio State University, Columbus, OH, USA.

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2019 Nov;27(11):1839-1848. doi: 10.1109/taslp.2019.2934319. Epub 2019 Aug 12.

Abstract

Speaker separation refers to the problem of separating speech signals from a mixture of simultaneous speakers. Previous studies are limited to addressing the speaker separation problem in anechoic conditions. This paper addresses the problem of talker-dependent speaker separation in reverberant conditions, which are characteristic of real-world environments. We employ recurrent neural networks with bidirectional long short-term memory (BLSTM) to separate and dereverberate the target speech signal. We propose two-stage networks to effectively deal with both speaker separation and speech dereverberation. In the two-stage model, the first stage separates and dereverberates two-talker mixtures and the second stage further enhances the separated target signal. We have extensively evaluated the two-stage architecture, and our empirical results demonstrate large improvements over unprocessed mixtures and clear performance gain over single-stage networks in a wide range of target-to-interferer ratios and reverberation times in simulated as well as recorded rooms. Moreover, we show that time-frequency masking yields better performance than spectral mapping for reverberant speaker separation.

摘要

说话人分离是指从同时说话的混合语音信号中分离出各个语音信号的问题。以往的研究仅限于解决无回声条件下的说话人分离问题。本文研究了在混响环境中依赖于说话人的说话人分离问题,而混响环境是现实世界环境的特征。我们采用具有双向长短期记忆(BLSTM)的循环神经网络来分离目标语音信号并去除混响。我们提出了两阶段网络,以有效处理说话人分离和语音去混响问题。在两阶段模型中,第一阶段分离并去除两个说话人的混合语音信号的混响,第二阶段进一步增强分离出的目标信号。我们对两阶段架构进行了广泛评估,我们的实证结果表明,在模拟房间和录音房间中,在广泛的目标与干扰比和混响时间范围内,与未处理的混合语音相比有了很大改进,并且比单阶段网络有明显的性能提升。此外,我们表明,对于混响说话人分离,时频掩蔽比频谱映射具有更好的性能。

相似文献

1
Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.基于深度学习的说话人相关混响语音分离实证研究
IEEE/ACM Trans Audio Speech Lang Process. 2019 Nov;27(11):1839-1848. doi: 10.1109/taslp.2019.2934319. Epub 2019 Aug 12.
4
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.用于逐话语和连续语音分离的多麦克风复谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:2001-2014. doi: 10.1109/taslp.2021.3083405. Epub 2021 May 26.
6
Causal Deep CASA for Monaural Talker-Independent Speaker Separation.用于单声道独立说话人分离的因果深度CASA
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2109-2118. doi: 10.1109/taslp.2020.3007779. Epub 2020 Jul 8.
7
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.用于噪声混响语音增强的两阶段深度学习
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.
8
Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising.复域中的时频掩蔽用于语音去混响和降噪
IEEE/ACM Trans Audio Speech Lang Process. 2017 Jul;25(7):1492-1501. doi: 10.1109/TASLP.2017.2696307. Epub 2017 Apr 20.

本文引用的文献

2
Supervised Speech Separation Based on Deep Learning: An Overview.基于深度学习的监督语音分离:综述
IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.
3
Two-stage Deep Learning for Noisy-reverberant Speech Enhancement.用于噪声混响语音增强的两阶段深度学习
IEEE/ACM Trans Audio Speech Lang Process. 2019 Jan;27(1):53-62. doi: 10.1109/TASLP.2018.2870725. Epub 2018 Sep 17.
5
A Deep Ensemble Learning Method for Monaural Speech Separation.一种用于单声道语音分离的深度集成学习方法。
IEEE/ACM Trans Audio Speech Lang Process. 2016 Mar;24(5):967-977. doi: 10.1109/TASLP.2016.2536478. Epub 2016 Mar 1.
7
On Training Targets for Supervised Speech Separation.论监督语音分离的训练目标
IEEE/ACM Trans Audio Speech Lang Process. 2014 Dec;22(12):1849-1858. doi: 10.1109/TASLP.2014.2352935.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验