Suppr超能文献

基于机器学习的持续元音法嗓音嘶哑严重程度估计。

Machine learning based estimation of hoarseness severity using sustained vowelsa).

机构信息

Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany.

Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana 70803, USA.

出版信息

J Acoust Soc Am. 2024 Jan 1;155(1):381-395. doi: 10.1121/10.0024341.

Abstract

Auditory perceptual evaluation is considered the gold standard for assessing voice quality, but its reliability is limited due to inter-rater variability and coarse rating scales. This study investigates a continuous, objective approach to evaluate hoarseness severity combining machine learning (ML) and sustained phonation. For this purpose, 635 acoustic recordings of the sustained vowel /a/ and subjective ratings based on the roughness, breathiness, and hoarseness scale were collected from 595 subjects. A total of 50 temporal, spectral, and cepstral features were extracted from each recording and used to identify suitable ML algorithms. Using variance and correlation analysis followed by backward elimination, a subset of relevant features was selected. Recordings were classified into two levels of hoarseness, H<2 and H≥2, yielding a continuous probability score ŷ∈[0,1]. An accuracy of 0.867 and a correlation of 0.805 between the model's predictions and subjective ratings was obtained using only five acoustic features and logistic regression (LR). Further examination of recordings pre- and post-treatment revealed high qualitative agreement with the change in subjectively determined hoarseness levels. Quantitatively, a moderate correlation of 0.567 was obtained. This quantitative approach to hoarseness severity estimation shows promising results and potential for improving the assessment of voice quality.

摘要

听觉感知评估被认为是评估语音质量的金标准,但由于评分者间的可变性和粗糙的评分量表,其可靠性有限。本研究探讨了一种结合机器学习 (ML) 和持续发声的连续、客观的方法来评估嘶哑程度。为此,从 595 名受试者中收集了 635 个持续元音 /a/ 的声学记录和基于粗糙度、呼吸声和嘶哑度量表的主观评分。从每个录音中提取了总共 50 个时间、频谱和倒谱特征,并用于识别合适的 ML 算法。使用方差和相关分析以及向后消除,选择了一组相关特征。将录音分为嘶哑程度为 H<2 和 H≥2 的两个级别,产生连续概率得分 ŷ∈[0,1]。仅使用五个声学特征和逻辑回归 (LR),模型的预测与主观评分之间的准确率为 0.867,相关性为 0.805。对治疗前后的录音进行进一步检查,发现与主观确定的嘶哑程度变化具有很高的定性一致性。定量上,得到了中等相关性 0.567。这种嘶哑程度估计的定量方法显示出有希望的结果,并有可能改善语音质量评估。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验