• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在声乐教学中,使用监督式机器学习进行副语言歌唱属性识别,以描述古典男高音独唱嗓音。

Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy.

作者信息

Xu Yanze, Wang Weiqing, Cui Huahua, Xu Mingyang, Li Ming

机构信息

Data Science Research Center, Duke Kunshan University, Kunshan, China.

Advanced Computing East China Sub-Center, Suzhou, China.

出版信息

EURASIP J Audio Speech Music Process. 2022;2022(1):8. doi: 10.1186/s13636-022-00240-z. Epub 2022 Apr 15.

DOI:10.1186/s13636-022-00240-z
PMID:35440938
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9011380/
Abstract

Humans can recognize someone's identity through their voice and describe the timbral phenomena of voices. Likewise, the singing voice also has timbral phenomena. In vocal pedagogy, vocal teachers listen and then describe the timbral phenomena of their student's singing voice. In this study, in order to enable machines to describe the singing voice from the vocal pedagogy point of view, we perform a task called paralinguistic singing attribute recognition. To achieve this goal, we first construct and publish an open source dataset named Singing Voice Quality and Technique Database (SVQTD) for supervised learning. All the audio clips in SVQTD are downloaded from YouTube and processed by music source separation and silence detection. For annotation, seven paralinguistic singing attributes commonly used in vocal pedagogy are adopted as the labels. Furthermore, to explore the different supervised machine learning algorithm for classifying each paralinguistic singing attribute, we adopt three main frameworks, namely openSMILE features with support vector machine (SF-SVM), end-to-end deep learning (E2EDL), and deep embedding with support vector machine (DE-SVM). Our methods are based on existing frameworks commonly employed in other paralinguistic speech attribute recognition tasks. In SF-SVM, we separately use the feature set of the INTERSPEECH 2009 Challenge and that of the INTERSPEECH 2016 Challenge as the SVM classifier's input. In E2EDL, the end-to-end framework separately utilizes the ResNet and transformer encoder as feature extractors. In particular, to handle two-dimensional spectrogram input for a transformer, we adopt a sliced multi-head self-attention (SMSA) mechanism. In the DE-SVM, we use the representation extracted from the E2EDL model as the input of the SVM classifier. Experimental results on SVQTD show no absolute winner between E2EDL and the DE-SVM, which means that the back-end SVM classifier with the representation learned by E2E as input does not necessarily improve the performance. However, the DE-SVM that utilizes the ResNet as the feature extractor achieves the best average UAR, with an average 16% improvement over that of the SF-SVM with INTERSPEECH's hand-crafted feature set.

摘要

人类能够通过声音识别某人的身份,并描述声音的音色现象。同样,歌声也具有音色现象。在声乐教学中,声乐教师倾听并描述学生歌声的音色现象。在本研究中,为了使机器能够从声乐教学的角度描述歌声,我们执行了一项名为副语言歌唱属性识别的任务。为实现这一目标,我们首先构建并发布了一个名为歌唱声音质量与技巧数据库(SVQTD)的开源数据集用于监督学习。SVQTD中的所有音频片段均从YouTube下载,并经过音乐源分离和静音检测处理。对于标注,采用了声乐教学中常用的七个副语言歌唱属性作为标签。此外,为了探索用于对每个副语言歌唱属性进行分类的不同监督机器学习算法,我们采用了三个主要框架,即带有支持向量机的openSMILE特征(SF-SVM)、端到端深度学习(E2EDL)以及带有支持向量机的深度嵌入(DE-SVM)。我们的方法基于其他副语言语音属性识别任务中常用的现有框架。在SF-SVM中,我们分别使用2009年国际语音通信协会挑战赛和2016年国际语音通信协会挑战赛的特征集作为支持向量机分类器的输入。在E2EDL中,端到端框架分别利用ResNet和Transformer编码器作为特征提取器。特别地,为了处理Transformer的二维频谱图输入,我们采用了切片多头自注意力(SMSA)机制。在DE-SVM中,我们将从E2EDL模型中提取的表示作为支持向量机分类器的输入。在SVQTD上的实验结果表明,E2EDL和DE-SVM之间没有绝对的优胜者,这意味着以E2E学习到的表示作为输入的后端支持向量机分类器不一定能提高性能。然而,以ResNet作为特征提取器的DE-SVM实现了最佳的平均未加权平均召回率(UAR),与使用国际语音通信协会手工制作特征集的SF-SVM相比,平均提高了16%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/f05580e17c42/13636_2022_240_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/9d033ce2f683/13636_2022_240_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/736e5cd9c98e/13636_2022_240_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/4d0e573609f8/13636_2022_240_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/81a09fb8c4ab/13636_2022_240_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/f05580e17c42/13636_2022_240_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/9d033ce2f683/13636_2022_240_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/736e5cd9c98e/13636_2022_240_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/4d0e573609f8/13636_2022_240_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/81a09fb8c4ab/13636_2022_240_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d6f/9011380/f05580e17c42/13636_2022_240_Fig5_HTML.jpg

相似文献

1
Paralinguistic singing attribute recognition using supervised machine learning for describing the classical tenor solo singing voice in vocal pedagogy.在声乐教学中,使用监督式机器学习进行副语言歌唱属性识别,以描述古典男高音独唱嗓音。
EURASIP J Audio Speech Music Process. 2022;2022(1):8. doi: 10.1186/s13636-022-00240-z. Epub 2022 Apr 15.
2
Singing Voice Detection: A Survey.歌声检测:一项综述
Entropy (Basel). 2022 Jan 12;24(1):114. doi: 10.3390/e24010114.
3
Associations of Education and Training with Perceived Singing Voice Function Among Professional Singers.专业歌手的教育与培训和感知到的歌唱嗓音功能之间的关联
J Voice. 2021 May;35(3):500.e17-500.e24. doi: 10.1016/j.jvoice.2019.10.003. Epub 2019 Oct 31.
4
Towards Automated Vocal Mode Classification in Healthy Singing Voice-An XGBoost Decision Tree-Based Machine Learning Classifier.迈向健康歌唱声音中的自动发声模式分类——一种基于XGBoost决策树的机器学习分类器
J Voice. 2023 Nov 10. doi: 10.1016/j.jvoice.2023.09.006.
5
The Filtering Effect of Face Masks in their Detection from Speech.口罩对语音检测的过滤效果。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2079-2082. doi: 10.1109/EMBC46164.2021.9630634.
6
Investigation on the Extraction Methods of Timbre Features in Vocal Singing Based on Machine Learning.基于机器学习的歌唱音色特征提取方法研究。
Comput Intell Neurosci. 2022 Sep 17;2022:5074829. doi: 10.1155/2022/5074829. eCollection 2022.
7
Towards a Singing Voice Multi-Sensor Analysis Tool: System Design, and Assessment Based on Vocal Breathiness.面向歌声多传感器分析工具:基于发声呼吸音的系统设计与评估。
Sensors (Basel). 2021 Nov 30;21(23):8006. doi: 10.3390/s21238006.
8
Exercise Science Principles and the Vocal Warm-up: Implications for Singing Voice Pedagogy.运动科学原理与声乐热身:对声乐教学法的启示
J Voice. 2018 Jan;32(1):79-84. doi: 10.1016/j.jvoice.2017.03.018. Epub 2017 May 19.
9
Immediate effects of the semi-occluded vocal tract exercise with LaxVox® tube in singers.使用LaxVox®管进行半阻塞声道练习对歌手的即时影响。
Codas. 2016;28(5):618-624. doi: 10.1590/2317-1782/20162015168.
10
Evaluation of Singer's Voice Quality by Means of Visual Pattern Recognition.通过视觉模式识别评估歌手的嗓音质量。
J Voice. 2016 Jan;30(1):127.e21-30. doi: 10.1016/j.jvoice.2015.03.001. Epub 2015 Apr 30.

引用本文的文献

1
Dense dynamic convolutional network for Bel canto vocal technique assessment.用于美声唱法发声技术评估的密集动态卷积网络
Sci Rep. 2025 May 5;15(1):15666. doi: 10.1038/s41598-025-98726-1.
2
3 directional Inception-ResUNet: Deep spatial feature learning for multichannel singing voice separation with distortion.3 方向 Inception-ResUNet:用于带失真的多声道歌唱声音分离的深度空间特征学习。
PLoS One. 2024 Jan 29;19(1):e0289453. doi: 10.1371/journal.pone.0289453. eCollection 2024.
3
Unsupervised Single-Channel Singing Voice Separation with Weighted Robust Principal Component Analysis Based on Gammatone Auditory Filterbank and Vocal Activity Detection.

本文引用的文献

1
A Nasoendoscopic Study of "Head Resonance" and "Imposto" in Classical Singing.《头腔共鸣》与《面罩》在古典歌唱中的鼻内窥镜研究
J Voice. 2022 Jan;36(1):83-90. doi: 10.1016/j.jvoice.2020.04.013. Epub 2020 Jun 6.
2
Assessment of voice quality: Current state-of-the-art.嗓音质量评估:当前的技术水平。
Auris Nasus Larynx. 2015 Jun;42(3):183-8. doi: 10.1016/j.anl.2014.11.001. Epub 2014 Nov 28.
3
On the Acoustics of Emotion in Audio: What Speech, Music, and Sound have in Common.音频情感声学:言语、音乐和声音的共同之处
基于 Gamma 听觉滤波器组和话音活动检测的加权鲁棒主成分分析的无监督单通道歌唱声分离。
Sensors (Basel). 2023 Mar 10;23(6):3015. doi: 10.3390/s23063015.
4
Environment-Friendly Vocal Music Ecological Education: Sustainable Development of Vocal Music Education from the Perspective of Building.环境友好型声乐生态教育:建筑视角下的声乐教育可持续发展。
J Environ Public Health. 2022 Aug 23;2022:5168389. doi: 10.1155/2022/5168389. eCollection 2022.
5
The New Media Environment Presents Challenges and Opportunities for Music Education in Higher Education.新媒体环境给高校音乐教育带来挑战与机遇。
J Environ Public Health. 2022 Jul 13;2022:9261521. doi: 10.1155/2022/9261521. eCollection 2022.
Front Psychol. 2013 May 27;4:292. doi: 10.3389/fpsyg.2013.00292. eCollection 2013.
4
Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones.管弦乐器音色五个感知维度的声学结构。
J Acoust Soc Am. 2013 Jan;133(1):389-404. doi: 10.1121/1.4770244.
5
Evaluation of hypernasality in vowels using voice low tone to high tone ratio.使用嗓音低音高与高音高比率评估元音中的鼻音过重情况。
Cleft Palate Craniofac J. 2009 Jan;46(1):47-52. doi: 10.1597/07-184.1. Epub 2008 May 15.
6
Acoustic correlates of timbre space dimensions: a confirmatory study using synthetic tones.音色空间维度的声学关联:一项使用合成音的验证性研究。
J Acoust Soc Am. 2005 Jul;118(1):471-82. doi: 10.1121/1.1929229.
7
Voice low tone to high tone ratio--a new index for nasal airway assessment.
Chin J Physiol. 2003 Sep 30;46(3):123-7.
8
The perception of 'forward' and 'backward placement' of the singing voice.歌唱声音“靠前”和“靠后”的感觉。
Logoped Phoniatr Vocol. 2003;28(1):19-28. doi: 10.1080/14015430310010854.
9
Where is a singer's voice if it is placed "forward"?如果歌手的声音被放置在“靠前位置”,那它在哪里呢?
J Voice. 2002 Sep;16(3):383-91. doi: 10.1016/s0892-1997(02)00109-1.
10
Movement of the velum during speech and singing in classically trained singers.受过古典声乐训练的歌手在说话和唱歌时软腭的运动。
J Voice. 1997 Jun;11(2):212-21. doi: 10.1016/s0892-1997(97)80080-x.