Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany.
Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana 70803, USA.
J Acoust Soc Am. 2024 Jan 1;155(1):381-395. doi: 10.1121/10.0024341.
Auditory perceptual evaluation is considered the gold standard for assessing voice quality, but its reliability is limited due to inter-rater variability and coarse rating scales. This study investigates a continuous, objective approach to evaluate hoarseness severity combining machine learning (ML) and sustained phonation. For this purpose, 635 acoustic recordings of the sustained vowel /a/ and subjective ratings based on the roughness, breathiness, and hoarseness scale were collected from 595 subjects. A total of 50 temporal, spectral, and cepstral features were extracted from each recording and used to identify suitable ML algorithms. Using variance and correlation analysis followed by backward elimination, a subset of relevant features was selected. Recordings were classified into two levels of hoarseness, H<2 and H≥2, yielding a continuous probability score ŷ∈[0,1]. An accuracy of 0.867 and a correlation of 0.805 between the model's predictions and subjective ratings was obtained using only five acoustic features and logistic regression (LR). Further examination of recordings pre- and post-treatment revealed high qualitative agreement with the change in subjectively determined hoarseness levels. Quantitatively, a moderate correlation of 0.567 was obtained. This quantitative approach to hoarseness severity estimation shows promising results and potential for improving the assessment of voice quality.
听觉感知评估被认为是评估语音质量的金标准,但由于评分者间的可变性和粗糙的评分量表,其可靠性有限。本研究探讨了一种结合机器学习 (ML) 和持续发声的连续、客观的方法来评估嘶哑程度。为此,从 595 名受试者中收集了 635 个持续元音 /a/ 的声学记录和基于粗糙度、呼吸声和嘶哑度量表的主观评分。从每个录音中提取了总共 50 个时间、频谱和倒谱特征,并用于识别合适的 ML 算法。使用方差和相关分析以及向后消除,选择了一组相关特征。将录音分为嘶哑程度为 H<2 和 H≥2 的两个级别,产生连续概率得分 ŷ∈[0,1]。仅使用五个声学特征和逻辑回归 (LR),模型的预测与主观评分之间的准确率为 0.867,相关性为 0.805。对治疗前后的录音进行进一步检查,发现与主观确定的嘶哑程度变化具有很高的定性一致性。定量上,得到了中等相关性 0.567。这种嘶哑程度估计的定量方法显示出有希望的结果,并有可能改善语音质量评估。