Yousef Ahmed M, Castillo-Allendes Adrián, Berardi Mark L, Codino Juliana, Rubin Adam D, Hunter Eric J
Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts, USA.
Department of Surgery, Harvard Medical School, Boston, Massachusetts, USA.
Folia Phoniatr Logop. 2025 Feb 21:1-15. doi: 10.1159/000544852.
The Acoustic Voice Quality Index (AVQI) and Smoothed Cepstral Peak Prominence (CPPs) have been reported to effectively support the assessment of voice quality in persons seeking voice care across many languages. This study aimed to evaluate the diagnostic accuracy of these two measures in detecting voice disorders in American English speakers, comparing their performance to machine learning (ML) models.
This retrospective study included a cohort of 187 participants: 138 patients with clinically diagnosed voice disorders and 49 vocally healthy individuals. Each participant completed two voicing tasks: sustaining [a:] vowel and producing a running speech sample, which were then concatenated. These samples were analyzed using VOXplot software for AVQI-3 (version 03.01) and CPPs. Additionally, four ML models (random forest, k-nearest neighbors, support vector machine, and decision tree) were trained for comparison. The diagnostic accuracy of the two measures and models was assessed using various evaluation metrics, including receiver operating characteristic curve and Youden Index.
A cutoff score of 1.54 for the AVQI-3 (with 55% sensitivity and 80% specificity) and 14.35 dB for CPPs (with 65% sensitivity and 78% specificity) were identified for detecting voice disorders. Compared to an average ML sensitivity of 89% and specificity of 55%, CPPs offered a better balance between sensitivity and specificity, outperforming AVQI-3 and nearly matching the average ML performance.
ML shows great potential for supporting voice disorder diagnostics, especially as models become more generalizable and easier to interpret. However, current tools like AVQI-3 and CPPs remain more practical and accessible for clinical use in evaluating voice quality than commonly implemented models. CPPs, in particular, offers distinct advantages for identifying voice disorders, making it a recommended and feasible choice for clinics with limited resources.
据报道,声学语音质量指数(AVQI)和平滑谐波峰值突出度(CPPs)能有效辅助多种语言人群的嗓音质量评估。本研究旨在评估这两种指标在美国英语使用者中检测嗓音障碍的诊断准确性,并将其性能与机器学习(ML)模型进行比较。
这项回顾性研究纳入了187名参与者:138名临床诊断为嗓音障碍的患者和49名嗓音健康的个体。每位参与者完成两项发声任务:持续发[a:]元音和生成一段连续语流样本,然后将这些样本拼接起来。使用VOXplot软件对这些样本进行AVQI-3(版本03.01)和CPPs分析。此外,还训练了四个ML模型(随机森林、k近邻、支持向量机和决策树)用于比较。使用包括受试者工作特征曲线和尤登指数在内的各种评估指标评估这两种指标和模型的诊断准确性。
确定AVQI-3检测嗓音障碍的临界值为1.54(灵敏度为55%,特异度为80%),CPPs的临界值为14.35 dB(灵敏度为65%,特异度为78%)。与平均ML灵敏度89%和特异度55%相比,CPPs在灵敏度和特异度之间实现了更好的平衡,优于AVQI-3,且与ML平均性能相近。
ML在支持嗓音障碍诊断方面显示出巨大潜力,尤其是随着模型变得更具通用性且更易于解释。然而,目前像AVQI-3和CPPs这样的工具在评估嗓音质量方面比常用模型在临床应用中更实用、更易获取。特别是,CPPs在识别嗓音障碍方面具有明显优势,使其成为资源有限诊所的推荐且可行的选择。