Suppr超能文献

利用声学和电子声门图特征进行模态和非模态嗓音质量分类。

Modal and non-modal voice quality classification using acoustic and electroglottographic features.

作者信息

Borsky Michal, Mehta Daryush D, Van Stan Jarrad H, Gudnason Jon

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2017 Dec;25(12):2281-2291. doi: 10.1109/taslp.2017.2759002. Epub 2017 Nov 27.

Abstract

The goal of this study was to investigate the performance of different feature types for voice quality classification using multiple classifiers. The study compared the COVAREP feature set; which included glottal source features, frequency warped cepstrum and harmonic model features; against the mel-frequency cepstral coefficients (MFCCs) computed from the acoustic voice signal, acoustic-based glottal inverse filtered (GIF) waveform, and electroglottographic (EGG) waveform. Our hypothesis was that MFCCs can capture the perceived voice quality from either of these three voice signals. Experiments were carried out on recordings from 28 participants with normal vocal status who were prompted to sustain vowels with modal and non-modal voice qualities. Recordings were rated by an expert listener using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), and the ratings were transformed into a dichotomous label (presence or absence) for the prompted voice qualities of modal voice, breathiness, strain, and roughness. The classification was done using support vector machines, random forests, deep neural networks and Gaussian mixture model classifiers, which were built as speaker independent using a leave-one-speaker-out strategy. The best classification accuracy of 79.97% was achieved for the full COVAREP set. The harmonic model features were the best performing subset, with 78.47% accuracy, and the static+dynamic MFCCs scored at 74.52%. A closer analysis showed that MFCC and dynamic MFCC features were able to classify modal, breathy, and strained voice quality dimensions from the acoustic and GIF waveforms. Reduced classification performance was exhibited by the EGG waveform.

摘要

本研究的目的是使用多个分类器来调查不同特征类型在语音质量分类方面的性能。该研究将包含声门源特征、频率扭曲倒谱和谐波模型特征的COVAREP特征集与从声学语音信号、基于声学的声门逆滤波(GIF)波形和电子声门图(EGG)波形计算得到的梅尔频率倒谱系数(MFCC)进行了比较。我们的假设是MFCC能够从这三种语音信号中的任何一种捕捉到感知到的语音质量。对28名具有正常发声状态的参与者的录音进行了实验,这些参与者被要求用正常和非正常语音质量持续发出元音。录音由一名专业听众使用语音的共识听觉-感知评估(CAPE-V)进行评分,并且评分被转换为一个二分标签(存在或不存在),用于表示正常语音、呼吸声、紧张和粗糙等提示语音质量。分类使用支持向量机、随机森林、深度神经网络和高斯混合模型分类器完成,这些分类器采用留一法策略构建为与说话者无关的。完整的COVAREP集实现了79.97%的最佳分类准确率。谐波模型特征是表现最佳的子集,准确率为78.47%,静态+动态MFCC的准确率为74.52%。进一步分析表明,MFCC和动态MFCC特征能够从声学和GIF波形中对正常、呼吸声和紧张的语音质量维度进行分类。EGG波形表现出较低的分类性能。

相似文献

1
6
Electroglottographic and acoustic analysis of voice in children with vocal nodules.声带小结患儿嗓音的电声门图及声学分析
Int J Pediatr Otorhinolaryngol. 2019 Jul;122:82-88. doi: 10.1016/j.ijporl.2019.03.030. Epub 2019 Apr 2.

本文引用的文献

3
Evidence-based clinical voice assessment: a systematic review.循证临床嗓音评估:系统综述。
Am J Speech Lang Pathol. 2013 May;22(2):212-26. doi: 10.1044/1058-0360(2012/12-0014). Epub 2012 Nov 26.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验