• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于一维卷积神经网络的嗓音障碍分类

Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network.

机构信息

Department of Otolaryngology-Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan.

Department of Otolaryngology, Tenri Hospital, Tenri, Nara, Japan.

出版信息

J Voice. 2022 Jan;36(1):15-20. doi: 10.1016/j.jvoice.2020.02.009. Epub 2020 Mar 13.

DOI:10.1016/j.jvoice.2020.02.009
PMID:32173149
Abstract

OBJECTIVES

Auditory-perceptual voice analysis is a standard method for quantifying pathological voice quality, but perceptual ratings are based on subjective evaluations and therefore may vary among examiners. Although many acoustic metrics have been studied for potential use in the objective evaluation of pathological voices, the interpretation of acoustic metrics in individual cases is difficult and the technique is not widely used by clinicians. The aim of this study was to establish standardized methods to discriminate grade, roughness, breathiness, asthenia, strain (GRBAS) scale scores of pathological voices directly using one-dimensional convolutional neural network (1D-CNN) models.

METHODS

We constructed an original dataset utilizing 1,377 voice samples of sustained phonation of the vowel /a/. Each voice sample was rated by three experts according to the GRBAS scale and the median values were used as the correct answer label. We designed an end-to-end 1D-CNN model with a raw voice waveform input having a frame width of 9,600 samples. The models were trained with our original dataset for each GRBAS category individually and the model performance was tested by the five-fold cross validation method.

RESULTS

The accuracy, F1 score, and quadratic weighted Cohen's kappa for the testing dataset were determined. The metrics for the G scale showed the most balanced model performance, with high accuracy (0.771) and substantial agreement (kappa = 0.710). The model for the R scale had relatively high accuracy (0.765) and F1 score (0.743) with moderate agreement (kappa = 0.536). The accuracy (0.883) and the F1 score (0.865) for the S scale were the highest among the five categories, whereas the Cohen's kappa was the lowest (0.190).

CONCLUSIONS

The end-to-end 1D-CNN models can evaluate overall pathological voice quality with a reliability comparable to human evaluations. The efficiency with which the machine learning models can be trained and evaluated is closely related to the dataset quality.

摘要

目的

听觉感知语音分析是量化病理性嗓音的标准方法,但感知评分基于主观评估,因此可能因检查者而异。虽然已经研究了许多声学指标来潜在地用于病理性嗓音的客观评估,但在个别情况下解释声学指标较为困难,该技术也未被临床医生广泛使用。本研究的目的是建立标准化方法,直接使用一维卷积神经网络(1D-CNN)模型区分病理性嗓音的 GRBAS(嘶哑、粗糙、气息声、无力、紧张)量表评分。

方法

我们利用持续发/a/元音的 1377 个语音样本构建了一个原始数据集。每个语音样本均由三位专家根据 GRBAS 量表进行评分,中位数被用作正确答案标签。我们设计了一个端到端的 1D-CNN 模型,其输入为原始语音波形,帧宽为 9600 个样本。每个 GRBAS 类别均使用我们的原始数据集单独训练模型,并使用五重交叉验证方法测试模型性能。

结果

确定了测试数据集的准确性、F1 分数和二次加权 Cohen's kappa。G 量表的指标显示出最平衡的模型性能,具有较高的准确性(0.771)和较大的一致性(kappa=0.710)。R 量表的模型具有相对较高的准确性(0.765)和 F1 分数(0.743),具有中等一致性(kappa=0.536)。五个类别中,S 量表的准确性(0.883)和 F1 分数(0.865)最高,而 Cohen's kappa 最低(0.190)。

结论

端到端的 1D-CNN 模型可以评估整体病理性嗓音质量,其可靠性可与人类评估相媲美。机器学习模型的训练和评估效率与数据集质量密切相关。

相似文献

1
Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network.基于一维卷积神经网络的嗓音障碍分类
J Voice. 2022 Jan;36(1):15-20. doi: 10.1016/j.jvoice.2020.02.009. Epub 2020 Mar 13.
2
Automatic GRBAS Scoring of Pathological Voices using Deep Learning and a Small Set of Labeled Voice Data.使用深度学习和少量标记语音数据对病理性嗓音进行自动GRBAS评分
J Voice. 2025 May;39(3):846.e1-846.e23. doi: 10.1016/j.jvoice.2022.10.020. Epub 2022 Nov 25.
3
The Effect of Noise on Deep Learning for Classification of Pathological Voice.噪声对病理性语音分类的深度学习的影响。
Laryngoscope. 2024 Aug;134(8):3537-3541. doi: 10.1002/lary.31303. Epub 2024 Jan 27.
4
The Relationship Between Auditory-Perceptual Rating Scales and Objective Voice Measures in Children With Voice Disorders.《嗓音障碍儿童的听觉感知评估量表与客观嗓音测量指标之间的关系》。
Am J Speech Lang Pathol. 2021 Jan 27;30(1):228-238. doi: 10.1044/2020_AJSLP-20-00188. Epub 2021 Jan 13.
5
The Influence of Native Language on Auditory-Perceptual Evaluation of Vocal Samples Completed by Brazilian and Canadian SLPs.母语对巴西和加拿大语言病理学家完成的语音样本听觉感知评估的影响。
J Voice. 2017 Mar;31(2):258.e1-258.e5. doi: 10.1016/j.jvoice.2016.05.021. Epub 2016 Jul 11.
6
Concatenation of the Moving Window Technique for Auditory-Perceptual Analysis of Voice Quality.拼接移动窗口技术在嗓音音质听觉感知分析中的应用。
Am J Speech Lang Pathol. 2018 Nov 21;27(4):1426-1433. doi: 10.1044/2018_AJSLP-17-0103.
7
Objective Assessment of Pathological Voice Using Artificial Intelligence Based on the GRBAS Scale.基于 GRBAS 量表的人工智能在病理性嗓音评估中的客观评估。
J Voice. 2024 May;38(3):561-566. doi: 10.1016/j.jvoice.2021.11.021. Epub 2021 Dec 30.
8
Validation of the Acoustic Voice Quality Index in the Lithuanian Language.立陶宛语声学语音质量指数的验证。
J Voice. 2017 Mar;31(2):257.e1-257.e11. doi: 10.1016/j.jvoice.2016.06.002. Epub 2016 Jul 15.
9
Cepstral analysis of hypokinetic and ataxic voices: correlations with perceptual and other acoustic measures.运动减退性和共济失调性嗓音的谐波倒谱分析:与感知及其他声学指标的相关性
J Voice. 2014 Nov;28(6):673-80. doi: 10.1016/j.jvoice.2014.01.013. Epub 2014 May 16.
10
Acoustic parameters for classification of breathiness in continuous speech according to the GRBAS scale.根据GRBAS量表对连续语音中的呼吸音进行分类的声学参数。
J Voice. 2014 Sep;28(5):653.e9-653.e17. doi: 10.1016/j.jvoice.2013.07.016. Epub 2014 Apr 20.

引用本文的文献

1
Deep learning-based classification of speech disorder in stroke and hearing impairment.基于深度学习的中风和听力障碍语音障碍分类
PLoS One. 2025 May 28;20(5):e0315286. doi: 10.1371/journal.pone.0315286. eCollection 2025.
2
A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers.一种使用预训练模型和集成分类器对语音障碍进行二分类和多分类的混合方法。
BMC Med Inform Decis Mak. 2025 May 1;25(1):177. doi: 10.1186/s12911-025-02978-w.
3
Vowel segmentation impact on machine learning classification for chronic obstructive pulmonary disease.
元音分割对慢性阻塞性肺疾病机器学习分类的影响。
Sci Rep. 2025 Mar 22;15(1):9930. doi: 10.1038/s41598-025-95320-3.
4
Determination of Wheat Heading Stage Using Convolutional Neural Networks on Multispectral UAV Imaging Data.基于多光谱无人机成像数据的卷积神经网络对小麦抽穗期的测定。
Comput Intell Neurosci. 2022 Nov 24;2022:3655804. doi: 10.1155/2022/3655804. eCollection 2022.
5
Predictions for Three-Month Postoperative Vocal Recovery after Thyroid Surgery from Spectrograms with Deep Neural Network.基于深度神经网络的声谱图预测甲状腺手术后三个月的术后嗓音恢复。
Sensors (Basel). 2022 Aug 24;22(17):6387. doi: 10.3390/s22176387.