Suppr超能文献

用于音乐分离的深度聚类与传统网络:携手共进,力量更强。

DEEP CLUSTERING AND CONVENTIONAL NETWORKS FOR MUSIC SEPARATION: STRONGER TOGETHER.

作者信息

Luo Yi, Chen Zhuo, Hershey John R, Le Roux Jonathan, Mesgarani Nima

机构信息

Department of Electrical Engineering, Columbia University, New York, NY.

Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA.

出版信息

Proc IEEE Int Conf Acoust Speech Signal Process. 2017 Mar;2017:61-65. doi: 10.1109/ICASSP.2017.7952118. Epub 2017 Jun 19.

Abstract

Deep clustering is the first method to handle general audio separation scenarios with multiple sources of the same type and an arbitrary number of sources, performing impressively in speaker-independent speech separation tasks. However, little is known about its effectiveness in other challenging situations such as music source separation. Contrary to conventional networks that directly estimate the source signals, deep clustering generates an embedding for each time-frequency bin, and separates sources by clustering the bins in the embedding space. We show that deep clustering outperforms conventional networks on a singing voice separation task, in both matched and mismatched conditions, even though conventional networks have the advantage of end-to-end training for best signal approximation, presumably because its more flexible objective engenders better regularization. Since the strengths of deep clustering and conventional network architectures appear complementary, we explore combining them in a single hybrid network trained via an approach akin to multi-task learning. Remarkably, the combination significantly outperforms either of its components.

摘要

深度聚类是处理具有相同类型多个声源和任意数量声源的一般音频分离场景的第一种方法,在与说话者无关的语音分离任务中表现出色。然而,对于其在其他具有挑战性的情况(如音乐源分离)中的有效性,人们了解甚少。与直接估计源信号的传统网络不同,深度聚类为每个时频仓生成一个嵌入,并通过在嵌入空间中对这些仓进行聚类来分离声源。我们表明,即使传统网络具有端到端训练以实现最佳信号近似的优势,但在匹配和不匹配条件下的歌声分离任务中,深度聚类的表现均优于传统网络,这可能是因为其更灵活的目标带来了更好的正则化。由于深度聚类和传统网络架构的优势似乎具有互补性,我们探索通过类似于多任务学习的方法将它们组合在一个单一的混合网络中进行训练。值得注意的是,这种组合的性能明显优于其任何一个组件。

相似文献

1
DEEP CLUSTERING AND CONVENTIONAL NETWORKS FOR MUSIC SEPARATION: STRONGER TOGETHER.用于音乐分离的深度聚类与传统网络:携手共进,力量更强。
Proc IEEE Int Conf Acoust Speech Signal Process. 2017 Mar;2017:61-65. doi: 10.1109/ICASSP.2017.7952118. Epub 2017 Jun 19.
2
DEEP ATTRACTOR NETWORK FOR SINGLE-MICROPHONE SPEAKER SEPARATION.用于单麦克风扬声器分离的深度吸引子网络
Proc IEEE Int Conf Acoust Speech Signal Process. 2017 Mar;2017:246-250. doi: 10.1109/ICASSP.2017.7952155. Epub 2017 Jun 19.
7
Combination of deep speaker embeddings for diarisation.用于语音分离的深度说话人嵌入组合
Neural Netw. 2021 Sep;141:372-384. doi: 10.1016/j.neunet.2021.04.020. Epub 2021 Apr 21.
8
Deep Learning for Talker-dependent Reverberant Speaker Separation: An Empirical Study.基于深度学习的说话人相关混响语音分离实证研究
IEEE/ACM Trans Audio Speech Lang Process. 2019 Nov;27(11):1839-1848. doi: 10.1109/taslp.2019.2934319. Epub 2019 Aug 12.
9
ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.基于波分裂网络的移动扬声器在线双耳语音分离
Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10095695. Epub 2023 May 5.

引用本文的文献

4
ONLINE BINAURAL SPEECH SEPARATION OF MOVING SPEAKERS WITH A WAVESPLIT NETWORK.基于波分裂网络的移动扬声器在线双耳语音分离
Proc IEEE Int Conf Acoust Speech Signal Process. 2023 Jun;2023. doi: 10.1109/icassp49357.2023.10095695. Epub 2023 May 5.
5
Quantitative models of auditory cortical processing.听觉皮层处理的定量模型。
Hear Res. 2023 Mar 1;429:108697. doi: 10.1016/j.heares.2023.108697. Epub 2023 Jan 14.
7
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation.用于逐话语和连续语音分离的多麦克风复谱映射
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:2001-2014. doi: 10.1109/taslp.2021.3083405. Epub 2021 May 26.
8
Towards Model Compression for Deep Learning Based Speech Enhancement.面向基于深度学习的语音增强的模型压缩
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1785-1794. doi: 10.1109/taslp.2021.3082282. Epub 2021 May 21.
10
On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement.基于深度学习的语音增强跨语料库泛化研究
IEEE/ACM Trans Audio Speech Lang Process. 2020;28:2489-2499. doi: 10.1109/taslp.2020.3016487. Epub 2020 Aug 14.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验