Suppr超能文献

基于波分裂网络的移动扬声器在线双耳语音分离

ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.

作者信息

Han Cong, Mesgarani Nima

机构信息

Department of Electrical Engineering, Columbia University, New York, NY.

出版信息

Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10095695. Epub 2023 May 5.

Abstract

Binaural speech separation in real-world scenarios often involves moving speakers. Most current speech separation methods use utterance-level permutation invariant training (u-PIT) for training. In inference time, however, the order of outputs can be inconsistent over time particularly in long-form speech separation. This situation which is referred to as the speaker swap problem is even more problematic when speakers constantly move in space and therefore poses a challenge for consistent placement of speakers in output channels. Here, we describe a real-time binaural speech separation model based on a Wavesplit network to mitigate the speaker swap problem for moving speaker separation. Our model computes a speaker embedding for each speaker at each time frame from the mixed audio, aggregates embeddings using online clustering, and uses cluster centroids as speaker profiles to track each speaker throughout the long duration. Experimental results on reverberant, long-form moving multitalker speech separation show that the proposed method is less prone to speaker swap and achieves comparable performance with u-PIT based models with ground truth tracking in both separation accuracy and preserving the interaural cues.

摘要

在现实场景中的双耳语音分离通常涉及移动的说话者。当前大多数语音分离方法在训练时使用 utterance-level 排列不变训练(u-PIT)。然而,在推理阶段,输出顺序可能会随时间不一致,特别是在长语音分离中。这种被称为说话者交换问题的情况,当说话者在空间中不断移动时会更成问题,因此对在输出通道中一致地放置说话者构成了挑战。在此,我们描述了一种基于 Wavesplit 网络的实时双耳语音分离模型,以减轻用于移动说话者分离的说话者交换问题。我们的模型从混合音频中为每个说话者在每个时间帧计算一个说话者嵌入,使用在线聚类聚合嵌入,并使用聚类中心作为说话者轮廓来在长时间内跟踪每个说话者。在有混响的长时移动多说话者语音分离上的实验结果表明,所提出的方法不太容易出现说话者交换,并且在分离精度和保留双耳线索方面与基于 u-PIT 且有真实值跟踪的模型具有可比的性能。

相似文献

1
ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.基于波分裂网络的移动扬声器在线双耳语音分离
Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10095695. Epub 2023 May 5.
5
Attentive Training: A New Training Framework for Speech Enhancement.注意力训练:一种用于语音增强的新训练框架。
IEEE/ACM Trans Audio Speech Lang Process. 2023;31:1360-1370. doi: 10.1109/taslp.2023.3260711. Epub 2023 Mar 23.
6
Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.基于深度学习的说话人相关混响语音分离实证研究
IEEE/ACM Trans Audio Speech Lang Process. 2019 Nov;27(11):1839-1848. doi: 10.1109/taslp.2019.2934319. Epub 2019 Aug 12.

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验