Suppr超能文献

基于一维卷积神经网络的嗓音障碍分类

Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network.

机构信息

Department of Otolaryngology-Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan.

Department of Otolaryngology, Tenri Hospital, Tenri, Nara, Japan.

出版信息

J Voice. 2022 Jan;36(1):15-20. doi: 10.1016/j.jvoice.2020.02.009. Epub 2020 Mar 13.

Abstract

OBJECTIVES

Auditory-perceptual voice analysis is a standard method for quantifying pathological voice quality, but perceptual ratings are based on subjective evaluations and therefore may vary among examiners. Although many acoustic metrics have been studied for potential use in the objective evaluation of pathological voices, the interpretation of acoustic metrics in individual cases is difficult and the technique is not widely used by clinicians. The aim of this study was to establish standardized methods to discriminate grade, roughness, breathiness, asthenia, strain (GRBAS) scale scores of pathological voices directly using one-dimensional convolutional neural network (1D-CNN) models.

METHODS

We constructed an original dataset utilizing 1,377 voice samples of sustained phonation of the vowel /a/. Each voice sample was rated by three experts according to the GRBAS scale and the median values were used as the correct answer label. We designed an end-to-end 1D-CNN model with a raw voice waveform input having a frame width of 9,600 samples. The models were trained with our original dataset for each GRBAS category individually and the model performance was tested by the five-fold cross validation method.

RESULTS

The accuracy, F1 score, and quadratic weighted Cohen's kappa for the testing dataset were determined. The metrics for the G scale showed the most balanced model performance, with high accuracy (0.771) and substantial agreement (kappa = 0.710). The model for the R scale had relatively high accuracy (0.765) and F1 score (0.743) with moderate agreement (kappa = 0.536). The accuracy (0.883) and the F1 score (0.865) for the S scale were the highest among the five categories, whereas the Cohen's kappa was the lowest (0.190).

CONCLUSIONS

The end-to-end 1D-CNN models can evaluate overall pathological voice quality with a reliability comparable to human evaluations. The efficiency with which the machine learning models can be trained and evaluated is closely related to the dataset quality.

摘要

目的

听觉感知语音分析是量化病理性嗓音的标准方法,但感知评分基于主观评估,因此可能因检查者而异。虽然已经研究了许多声学指标来潜在地用于病理性嗓音的客观评估,但在个别情况下解释声学指标较为困难,该技术也未被临床医生广泛使用。本研究的目的是建立标准化方法,直接使用一维卷积神经网络(1D-CNN)模型区分病理性嗓音的 GRBAS(嘶哑、粗糙、气息声、无力、紧张)量表评分。

方法

我们利用持续发/a/元音的 1377 个语音样本构建了一个原始数据集。每个语音样本均由三位专家根据 GRBAS 量表进行评分,中位数被用作正确答案标签。我们设计了一个端到端的 1D-CNN 模型,其输入为原始语音波形,帧宽为 9600 个样本。每个 GRBAS 类别均使用我们的原始数据集单独训练模型,并使用五重交叉验证方法测试模型性能。

结果

确定了测试数据集的准确性、F1 分数和二次加权 Cohen's kappa。G 量表的指标显示出最平衡的模型性能,具有较高的准确性(0.771)和较大的一致性(kappa=0.710)。R 量表的模型具有相对较高的准确性(0.765)和 F1 分数(0.743),具有中等一致性(kappa=0.536)。五个类别中,S 量表的准确性(0.883)和 F1 分数(0.865)最高,而 Cohen's kappa 最低(0.190)。

结论

端到端的 1D-CNN 模型可以评估整体病理性嗓音质量,其可靠性可与人类评估相媲美。机器学习模型的训练和评估效率与数据集质量密切相关。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验