基于 Gamma 听觉滤波器组和话音活动检测的加权鲁棒主成分分析的无监督单通道歌唱声分离。

Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection.

机构信息

Department of Computer Science and Technology, Anhui University of Finance and Economics, Bengbu 233030, China.

School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China.

出版信息

Sensors (Basel). 2023 Mar 10;23(6):3015. doi: 10.3390/s23063015.

DOI:10.3390/s23063015

PMID:36991724

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10056690/

Abstract

Singing-voice separation is a separation task that involves a singing voice and musical accompaniment. In this paper, we propose a novel, unsupervised methodology for extracting a singing voice from the background in a musical mixture. This method is a modification of robust principal component analysis (RPCA) that separates a singing voice by using weighting based on gammatone filterbank and vocal activity detection. Although RPCA is a helpful method for separating voices from the music mixture, it fails when one single value, such as drums, is much larger than others (e.g., the accompanying instruments). As a result, the proposed approach takes advantage of varying values between low-rank (background) and sparse matrices (singing voice). Additionally, we propose an expanded RPCA on the cochleagram by utilizing coalescent masking on the gammatone. Finally, we utilize vocal activity detection to enhance the separation outcomes by eliminating the lingering music signal. Evaluation results reveal that the proposed approach provides superior separation outcomes than RPCA on ccMixter and DSD100 datasets.

摘要

歌声分离是一种分离任务，涉及歌声和音乐伴奏。在本文中，我们提出了一种新颖的、无监督的方法，用于从音乐混合物中提取背景中的歌声。该方法是鲁棒主成分分析（RPCA）的一种改进，通过基于伽马滤波器组和语音活动检测的加权来分离歌声。虽然 RPCA 是一种从音乐混合物中分离声音的有用方法，但当单个值（例如鼓）比其他值（例如伴奏乐器）大得多时，它就会失效。因此，所提出的方法利用低秩（背景）和稀疏矩阵（歌声）之间的变化值。此外，我们还提出了一种基于伽马滤波的协同掩蔽的扩展 RPCA，并利用它对 Cochleagram 进行处理。最后，我们利用语音活动检测来消除残留的音乐信号，从而提高分离效果。评估结果表明，所提出的方法在 ccMixter 和 DSD100 数据集上的分离效果优于 RPCA。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/de9e/10056690/9215b94075bc/sensors-23-03015-g001.jpg

相似文献

Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection.

Sensors (Basel). 2023 Mar 10;23(6):3015. doi: 10.3390/s23063015.

Associations of Education and Training with Perceived Singing Voice Function Among Professional Singers.

J Voice. 2021 May;35(3):500.e17-500.e24. doi: 10.1016/j.jvoice.2019.10.003. Epub 2019 Oct 31.

3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.

PLoS One. 2024 Jan 29;19(1):e0289453. doi: 10.1371/journal.pone.0289453. eCollection 2024.

The singing voice is special: Persistence of superior memory for vocal melodies despite vocal-motor distractions.

Cognition. 2021 Aug;213:104514. doi: 10.1016/j.cognition.2020.104514. Epub 2020 Nov 24.

Contemporary Commercial Music Singing Students-Voice Quality and Vocal Function at the Beginning of Singing Training.

J Voice. 2018 Nov;32(6):668-672. doi: 10.1016/j.jvoice.2017.08.027. Epub 2017 Oct 3.

Quality of the singing voice in sopranos. Influence of the text and the musical accompaniment in the opera singing.

Acta Otorrinolaringol Esp (Engl Ed). 2023 May-Jun;74(3):160-168. doi: 10.1016/j.otoeng.2023.05.002. Epub 2023 May 5.

Perspectives on the impact on vocal function of heavy vocal load among working professional music theater performers.

J Voice. 2013 May;27(3):390.e31-9. doi: 10.1016/j.jvoice.2012.12.003. Epub 2013 Feb 13.

Perceptual, auditory and acoustic vocal analysis of speech and singing in choir conductors.

Pro Fono. 2008 Jul-Sep;20(3):195-200. doi: 10.1590/s0104-56872008000300010.

Towards a Singing Voice Multi-Sensor Analysis Tool: System Design, and Assessment Based on Vocal Breathiness.

Sensors (Basel). 2021 Nov 30;21(23):8006. doi: 10.3390/s21238006.

Normative voice range profiles in vocally trained and untrained children aged between 7 and 10 years.

J Voice. 2010 Mar;24(2):153-60. doi: 10.1016/j.jvoice.2008.07.007. Epub 2009 Mar 20.

本文引用的文献

Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy.

EURASIP J Audio Speech Music Process. 2022;2022(1):8. doi: 10.1186/s13636-022-00240-z. Epub 2022 Apr 15.

Anti-transfer learning for task invariance in convolutional neural networks for speech processing.

Neural Netw. 2021 Oct;142:238-251. doi: 10.1016/j.neunet.2021.05.012. Epub 2021 May 14.

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.

IEEE/ACM Trans Audio Speech Lang Process. 2019 Aug;27(8):1256-1266. doi: 10.1109/TASLP.2019.2915167. Epub 2019 May 6.

The optimal threshold for removing noise from speech is similar across normal and impaired hearing-a time-frequency masking study.

J Acoust Soc Am. 2019 Jun;145(6):EL581. doi: 10.1121/1.5112828.

Supervised Speech Separation Based on Deep Learning: An Overview.

IEEE/ACM Trans Audio Speech Lang Process. 2018 Oct;26(10):1702-1726. doi: 10.1109/TASLP.2018.2842159. Epub 2018 May 30.

Machine learning source separation using maximum a posteriori nonnegative matrix factorization.

IEEE Trans Cybern. 2014 Jul;44(7):1169-79. doi: 10.1109/TCYB.2013.2281332. Epub 2013 Nov 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 Gamma 听觉滤波器组和话音活动检测的加权鲁棒主成分分析的无监督单通道歌唱声分离。

Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献