利用声学和电子声门图特征进行模态和非模态嗓音质量分类。

Modal and non-modal voice quality classification using acoustic and electroglottographic features.

作者信息

Borsky Michal, Mehta Daryush D, Van Stan Jarrad H, Gudnason Jon

出版信息

IEEE/ACM Trans Audio Speech Lang Process. 2017 Dec;25(12):2281-2291. doi: 10.1109/taslp.2017.2759002. Epub 2017 Nov 27.

DOI:10.1109/taslp.2017.2759002

PMID:33748320

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7971071/

Abstract

The goal of this study was to investigate the performance of different feature types for voice quality classification using multiple classifiers. The study compared the COVAREP feature set; which included glottal source features, frequency warped cepstrum and harmonic model features; against the mel-frequency cepstral coefficients (MFCCs) computed from the acoustic voice signal, acoustic-based glottal inverse filtered (GIF) waveform, and electroglottographic (EGG) waveform. Our hypothesis was that MFCCs can capture the perceived voice quality from either of these three voice signals. Experiments were carried out on recordings from 28 participants with normal vocal status who were prompted to sustain vowels with modal and non-modal voice qualities. Recordings were rated by an expert listener using the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), and the ratings were transformed into a dichotomous label (presence or absence) for the prompted voice qualities of modal voice, breathiness, strain, and roughness. The classification was done using support vector machines, random forests, deep neural networks and Gaussian mixture model classifiers, which were built as speaker independent using a leave-one-speaker-out strategy. The best classification accuracy of 79.97% was achieved for the full COVAREP set. The harmonic model features were the best performing subset, with 78.47% accuracy, and the static+dynamic MFCCs scored at 74.52%. A closer analysis showed that MFCC and dynamic MFCC features were able to classify modal, breathy, and strained voice quality dimensions from the acoustic and GIF waveforms. Reduced classification performance was exhibited by the EGG waveform.

摘要

本研究的目的是使用多个分类器来调查不同特征类型在语音质量分类方面的性能。该研究将包含声门源特征、频率扭曲倒谱和谐波模型特征的COVAREP特征集与从声学语音信号、基于声学的声门逆滤波（GIF）波形和电子声门图（EGG）波形计算得到的梅尔频率倒谱系数（MFCC）进行了比较。我们的假设是MFCC能够从这三种语音信号中的任何一种捕捉到感知到的语音质量。对28名具有正常发声状态的参与者的录音进行了实验，这些参与者被要求用正常和非正常语音质量持续发出元音。录音由一名专业听众使用语音的共识听觉-感知评估（CAPE-V）进行评分，并且评分被转换为一个二分标签（存在或不存在），用于表示正常语音、呼吸声、紧张和粗糙等提示语音质量。分类使用支持向量机、随机森林、深度神经网络和高斯混合模型分类器完成，这些分类器采用留一法策略构建为与说话者无关的。完整的COVAREP集实现了79.97%的最佳分类准确率。谐波模型特征是表现最佳的子集，准确率为78.47%，静态+动态MFCC的准确率为74.52%。进一步分析表明，MFCC和动态MFCC特征能够从声学和GIF波形中对正常、呼吸声和紧张的语音质量维度进行分类。EGG波形表现出较低的分类性能。

相似文献

Modal and non-modal voice quality classification using acoustic and electroglottographic features.利用声学和电子声门图特征进行模态和非模态嗓音质量分类。

IEEE/ACM Trans Audio Speech Lang Process. 2017 Dec;25(12):2281-2291. doi: 10.1109/taslp.2017.2759002. Epub 2017 Nov 27.

Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.深度学习在嗓音障碍自动检测中的应用：比较声学特征并开发一个可推广的框架。

Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.

Detection of Neurogenic Voice Disorders Using the Fisher Vector Representation of Cepstral Features.使用倒谱特征的费舍尔向量表示法检测神经性嗓音障碍

J Voice. 2025 May;39(3):757-763. doi: 10.1016/j.jvoice.2022.10.016. Epub 2022 Nov 21.

Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer.使用基于生理学的发音语音合成器评估声门逆滤波算法

IEEE/ACM Trans Audio Speech Lang Process. 2017 Aug;25(8):1718-1730. doi: 10.1109/taslp.2017.2714839. Epub 2017 Jun 12.

COPDVD: Automated classification of chronic obstructive pulmonary disease on a new collected and evaluated voice dataset.COPDVD：在新收集和评估的语音数据集上对慢性阻塞性肺疾病进行自动化分类。

Artif Intell Med. 2024 Oct;156:102953. doi: 10.1016/j.artmed.2024.102953. Epub 2024 Aug 15.

Electroglottographic and acoustic analysis of voice in children with vocal nodules.声带小结患儿嗓音的电声门图及声学分析

Int J Pediatr Otorhinolaryngol. 2019 Jul;122:82-88. doi: 10.1016/j.ijporl.2019.03.030. Epub 2019 Apr 2.

Breathiness and Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) in Patients Undergoing Medialization Laryngoplasty With or Without Arytenoid Adduction.行或不行杓状软骨内收术的喉内移成形术患者的呼吸音及嗓音的共识听觉-感知评估（CAPE-V）

J Voice. 2021 Mar;35(2):312-316. doi: 10.1016/j.jvoice.2019.08.016. Epub 2019 Oct 9.

Modeling the glottal volume-velocity waveform for three voice types.对三种嗓音类型的声门容积速度波形进行建模。

J Acoust Soc Am. 1995 Jan;97(1):505-19. doi: 10.1121/1.412276.

Formant-Aware Spectral Analysis of Sustained Vowels of Pathological Breathy Voice.病理性呼吸声持续元音的共振峰感知频谱分析

J Voice. 2023 Jun 9. doi: 10.1016/j.jvoice.2023.05.002.

Spectral- and cepstral-based acoustic features of dysphonic, strained voice quality.基于频谱和倒谱的嗓音障碍、紧张音质的声学特征。

Ann Otol Rhinol Laryngol. 2012 Aug;121(8):539-48. doi: 10.1177/000348941212100808.

引用本文的文献

Automatic Detection of COVID-19 Based on Short-Duration Acoustic Smartphone Speech Analysis.基于短时长声学智能手机语音分析的新冠病毒肺炎自动检测

J Healthc Inform Res. 2021;5(2):201-217. doi: 10.1007/s41666-020-00090-4. Epub 2021 Mar 11.

Discrimination between Modal, Breathy and Pressed Voice for Single Vowels Using Neck-Surface Vibration Signals.利用颈部表面振动信号对单元音的模态音、气息音和紧喉音进行辨别。

Appl Sci (Basel). 2019 Apr;9(7). doi: 10.3390/app9071505. Epub 2019 Apr 11.

本文引用的文献

Characterization Methods for the Detection of Multiple Voice Disorders: Neurological, Functional, and Laryngeal Diseases.多种嗓音障碍的检测特征方法：神经、功能和喉部疾病。

IEEE J Biomed Health Inform. 2015 Nov;19(6):1820-8. doi: 10.1109/JBHI.2015.2467375. Epub 2015 Aug 12.

Cepstral analysis of hypokinetic and ataxic voices: correlations with perceptual and other acoustic measures.运动减退性和共济失调性嗓音的谐波倒谱分析：与感知及其他声学指标的相关性

J Voice. 2014 Nov;28(6):673-80. doi: 10.1016/j.jvoice.2014.01.013. Epub 2014 May 16.

Evidence-based clinical voice assessment: a systematic review.循证临床嗓音评估：系统综述。

Am J Speech Lang Pathol. 2013 May;22(2):212-26. doi: 10.1044/1058-0360(2012/12-0014). Epub 2012 Nov 26.

Spectral- and cepstral-based acoustic features of dysphonic, strained voice quality.基于频谱和倒谱的嗓音障碍、紧张音质的声学特征。

Ann Otol Rhinol Laryngol. 2012 Aug;121(8):539-48. doi: 10.1177/000348941212100808.

Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease.新型语音信号处理算法可实现帕金森病的高精度分类。

IEEE Trans Biomed Eng. 2012 May;59(5):1264-71. doi: 10.1109/TBME.2012.2183367. Epub 2012 Jan 9.

Effects of consensus training on the reliability of auditory perceptual ratings of voice quality.共识训练对嗓音质量听觉感知评估可靠性的影响。

J Voice. 2012 May;26(3):304-12. doi: 10.1016/j.jvoice.2011.06.003. Epub 2011 Aug 12.

Automatic detection of pathological voices using complexity measures, noise parameters, and mel-cepstral coefficients.使用复杂度测度、噪声参数和梅尔倒谱系数自动检测病理性嗓音。

IEEE Trans Biomed Eng. 2011 Feb;58(2):370-9. doi: 10.1109/TBME.2010.2089052.

Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex.通过梅尔频率倒谱系数参数自动检测持续元音记录中的喉部病变，并按性别对患者进行区分。

Folia Phoniatr Logop. 2009;61(3):146-52. doi: 10.1159/000219950. Epub 2009 Jul 1.

Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol.嗓音的共识听觉感知评估：标准化临床方案的制定

Am J Speech Lang Pathol. 2009 May;18(2):124-32. doi: 10.1044/1058-0360(2008/08-0017). Epub 2008 Oct 16.

When and why listeners disagree in voice quality assessment tasks.听众在嗓音质量评估任务中出现分歧的时间及原因。

J Acoust Soc Am. 2007 Oct;122(4):2354-64. doi: 10.1121/1.2770547.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验