Department of Otolaryngology, Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
Laryngoscope. 2024 Aug;134(8):3537-3541. doi: 10.1002/lary.31303. Epub 2024 Jan 27.
This study aimed to evaluate the significance of background noise in machine learning models assessing the GRBAS scale for voice disorders.
A dataset of 1406 voice samples was collected from retrospective data, and a 5-layer 1D convolutional neural network (CNN) model was constructed using TensorFlow. The dataset was divided into training, validation, and test data. Gaussian noise was added to test samples at various intensities to assess the model's noise resilience. The model's performance was evaluated using accuracy, F1 score, and quadratic weighted Cohen's kappa score.
The model's performance on the GRBAS scale generally declined with increasing noise intensities. For the G scale, accuracy dropped from 70.9% (original) to 8.5% (at the highest noise), F1 score from 69.2% to 1.3%, and Cohen's kappa from 0.679 to 0.0. Similar declines were observed for the remaining RBAS components.
The model's performance was affected by background noise, with substantial decreases in evaluation metrics as noise levels intensified. Future research should explore noise-tolerant techniques, such as data augmentation, to improve the model's noise resilience in real-world settings.
This study evaluates a machine learning model using a single dataset without comparative controls. Given its non-comparative design and specific focus, it aligns with Level 4 evidence (Case-series) under the 2011 OCEBM guidelines Laryngoscope, 134:3537-3541, 2024.
本研究旨在评估背景噪声在用于评估嗓音障碍 GRBAS 量表的机器学习模型中的意义。
从回顾性数据中收集了 1406 个语音样本数据集,并使用 TensorFlow 构建了一个 5 层 1D 卷积神经网络(CNN)模型。数据集分为训练、验证和测试数据。向测试样本添加不同强度的高斯噪声,以评估模型的抗噪能力。使用准确性、F1 分数和二次加权 Cohen's kappa 分数评估模型的性能。
随着噪声强度的增加,模型在 GRBAS 量表上的性能普遍下降。对于 G 量表,准确性从原始的 70.9%下降到最高噪声时的 8.5%,F1 分数从 69.2%下降到 1.3%,Cohen's kappa 从 0.679 下降到 0.0。其余 RBAS 分量也观察到类似的下降。
模型的性能受到背景噪声的影响,随着噪声水平的加剧,评估指标显著下降。未来的研究应探索抗噪技术,例如数据增强,以提高模型在实际环境中的抗噪能力。
本研究使用单一数据集评估机器学习模型,没有对照组。鉴于其非比较设计和特定重点,根据 2011 年 OCEBM 指南,它与第 4 级证据(病例系列)一致,Laryngoscope, 134:3537-3541, 2024。