嗓音障碍筛查：声学嗓音质量指数、谐波峰值突出度与机器学习

Screening Voice Disorders: Acoustic Voice Quality Index, Cepstral Peak Prominence, and Machine Learning.

作者信息

Yousef Ahmed M, Castillo-Allendes Adrián, Berardi Mark L, Codino Juliana, Rubin Adam D, Hunter Eric J

机构信息

Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, Massachusetts, USA.

Department of Surgery, Harvard Medical School, Boston, Massachusetts, USA.

出版信息

Folia Phoniatr Logop. 2025 Feb 21:1-15. doi: 10.1159/000544852.

DOI:10.1159/000544852

PMID:39987907

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12353333/

Abstract

INTRODUCTION

The Acoustic Voice Quality Index (AVQI) and Smoothed Cepstral Peak Prominence (CPPs) have been reported to effectively support the assessment of voice quality in persons seeking voice care across many languages. This study aimed to evaluate the diagnostic accuracy of these two measures in detecting voice disorders in American English speakers, comparing their performance to machine learning (ML) models.

METHODS

This retrospective study included a cohort of 187 participants: 138 patients with clinically diagnosed voice disorders and 49 vocally healthy individuals. Each participant completed two voicing tasks: sustaining [a:] vowel and producing a running speech sample, which were then concatenated. These samples were analyzed using VOXplot software for AVQI-3 (version 03.01) and CPPs. Additionally, four ML models (random forest, k-nearest neighbors, support vector machine, and decision tree) were trained for comparison. The diagnostic accuracy of the two measures and models was assessed using various evaluation metrics, including receiver operating characteristic curve and Youden Index.

RESULTS

A cutoff score of 1.54 for the AVQI-3 (with 55% sensitivity and 80% specificity) and 14.35 dB for CPPs (with 65% sensitivity and 78% specificity) were identified for detecting voice disorders. Compared to an average ML sensitivity of 89% and specificity of 55%, CPPs offered a better balance between sensitivity and specificity, outperforming AVQI-3 and nearly matching the average ML performance.

CONCLUSIONS

ML shows great potential for supporting voice disorder diagnostics, especially as models become more generalizable and easier to interpret. However, current tools like AVQI-3 and CPPs remain more practical and accessible for clinical use in evaluating voice quality than commonly implemented models. CPPs, in particular, offers distinct advantages for identifying voice disorders, making it a recommended and feasible choice for clinics with limited resources.

摘要

引言

据报道，声学语音质量指数（AVQI）和平滑谐波峰值突出度（CPPs）能有效辅助多种语言人群的嗓音质量评估。本研究旨在评估这两种指标在美国英语使用者中检测嗓音障碍的诊断准确性，并将其性能与机器学习（ML）模型进行比较。

方法

这项回顾性研究纳入了187名参与者：138名临床诊断为嗓音障碍的患者和49名嗓音健康的个体。每位参与者完成两项发声任务：持续发[a:]元音和生成一段连续语流样本，然后将这些样本拼接起来。使用VOXplot软件对这些样本进行AVQI-3（版本03.01）和CPPs分析。此外，还训练了四个ML模型（随机森林、k近邻、支持向量机和决策树）用于比较。使用包括受试者工作特征曲线和尤登指数在内的各种评估指标评估这两种指标和模型的诊断准确性。