深度学习在声门疾病预测中的应用：通过语音识别——初步开发研究

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study.

机构信息

Institute of Clinical Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.

Department of Otorhinolaryngology-Head and Neck Surgery, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan.

出版信息

J Med Internet Res. 2021 Jun 8;23(6):e25247. doi: 10.2196/25247.

DOI:10.2196/25247

PMID:34100770

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8241431/

Abstract

BACKGROUND

Dysphonia influences the quality of life by interfering with communication. However, a laryngoscopic examination is expensive and not readily accessible in primary care units. Experienced laryngologists are required to achieve an accurate diagnosis.

OBJECTIVE

This study sought to detect various vocal fold diseases through pathological voice recognition using artificial intelligence.

METHODS

We collected 189 normal voice samples and 552 samples of individuals with voice disorders, including vocal atrophy (n=224), unilateral vocal paralysis (n=50), organic vocal fold lesions (n=248), and adductor spasmodic dysphonia (n=30). The 741 samples were divided into 2 sets: 593 samples as the training set and 148 samples as the testing set. A convolutional neural network approach was applied to train the model, and findings were compared with those of human specialists.

RESULTS

The convolutional neural network model achieved a sensitivity of 0.66, a specificity of 0.91, and an overall accuracy of 66.9% for distinguishing normal voice, vocal atrophy, unilateral vocal paralysis, organic vocal fold lesions, and adductor spasmodic dysphonia. Compared with the accuracy of human specialists, the overall accuracy rates were 60.1% and 56.1% for the 2 laryngologists and 51.4% and 43.2% for the 2 general ear, nose, and throat doctors.

CONCLUSIONS

Voice alone could be used for common vocal fold disease recognition through a deep learning approach after training with our Mandarin pathological voice database. This approach involving artificial intelligence could be clinically useful for screening general vocal fold disease using the voice. The approach includes a quick survey and a general health examination. It can be applied during telemedicine in areas with primary care units lacking laryngoscopic abilities. It could support physicians when prescreening cases by allowing for invasive examinations to be performed only for cases involving problems with automatic recognition or listening and for professional analyses of other clinical examination results that reveal doubts about the presence of pathologies.

摘要

背景

发声障碍通过干扰交流影响生活质量。然而，喉镜检查既昂贵又不能在初级保健单位普及，需要有经验的喉科医生才能做出准确的诊断。

目的

本研究旨在通过人工智能识别病理声音来检测各种声带疾病。

方法

我们收集了 189 个正常声音样本和 552 个患有声音障碍的个体样本，包括声带萎缩（n=224）、单侧声带麻痹（n=50）、器质性声带病变（n=248）和痉挛性发声障碍（n=30）。741 个样本被分为 2 组：593 个样本作为训练集，148 个样本作为测试集。应用卷积神经网络方法对模型进行训练，并将结果与人类专家的结果进行比较。

结果

对于区分正常声音、声带萎缩、单侧声带麻痹、器质性声带病变和痉挛性发声障碍，卷积神经网络模型的灵敏度为 0.66，特异性为 0.91，总准确率为 66.9%。与 2 位喉科专家和 2 位耳鼻喉科普通医生的准确率相比，总准确率分别为 60.1%和 56.1%，51.4%和 43.2%。

结论

通过使用我们的普通话病理声音数据库进行深度学习训练后，仅凭声音即可用于常见声带疾病的识别。这种涉及人工智能的方法可通过声音用于一般声带疾病的筛查，包括快速调查和一般健康检查。它可应用于缺乏喉镜能力的初级保健单位的远程医疗中，支持医生进行预筛选，仅对自动识别或听力有问题的病例进行有创检查，并对其他临床检查结果进行专业分析，以怀疑是否存在病理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a89f/8241431/54c6cfcce025/jmir_v23i6e25247_fig1.jpg

相似文献

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study.深度学习在声门疾病预测中的应用：通过语音识别——初步开发研究

J Med Internet Res. 2021 Jun 8;23(6):e25247. doi: 10.2196/25247.

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach.基于倒谱向量的病理性嗓音检测：深度学习方法。

J Voice. 2019 Sep;33(5):634-641. doi: 10.1016/j.jvoice.2018.02.003. Epub 2018 Mar 19.

Developing an Artificial Intelligence Tool to Predict Vocal Cord Pathology in Primary Care Settings.开发一种人工智能工具，以预测初级保健环境中的声带病变。

Laryngoscope. 2023 Aug;133(8):1952-1960. doi: 10.1002/lary.30432. Epub 2022 Oct 13.

The effect of unilateral thalamic deep brain stimulation on the vocal dysfunction in a patient with spasmodic dysphonia: interrogating cerebellar and pallidal neural circuits.单侧丘脑深部脑刺激对痉挛性发声障碍患者发声障碍的影响：小脑和苍白球神经回路的探讨。

J Neurosurg. 2018 Feb;128(2):575-582. doi: 10.3171/2016.10.JNS161025. Epub 2017 Mar 17.

Sulcus vocalis in spasmodic dysphonia-A retrospective study.痉挛性发音障碍中的声带沟：一项回顾性研究。

Am J Otolaryngol. 2021 May-Jun;42(3):102940. doi: 10.1016/j.amjoto.2021.102940. Epub 2021 Jan 28.

Vibratory Onset of Adductor Spasmodic Dysphonia and Muscle Tension Dysphonia: A High-Speed Video Study✰.《Adductor 痉挛性发声障碍和肌肉紧张性发声障碍的振动起始：高速视频研究✰》。

J Voice. 2020 Jul;34(4):598-603. doi: 10.1016/j.jvoice.2018.12.010. Epub 2018 Dec 28.

Decoding phonation with artificial intelligence (DeP AI): Proof of concept.利用人工智能解读发声（DeP AI）：概念验证

Laryngoscope Investig Otolaryngol. 2019 Mar 25;4(3):328-334. doi: 10.1002/lio2.259. eCollection 2019 Jun.

Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review.深度学习在声纹分析诊断声带病变中的应用：系统综述。

Eur Arch Otorhinolaryngol. 2024 Feb;281(2):863-871. doi: 10.1007/s00405-023-08362-6. Epub 2023 Dec 13.

The accuracy of an Online Sequential Extreme Learning Machine in detecting voice pathology using the Malaysian Voice Pathology Database.使用马来西亚语音病理学数据库检测语音病理学的在线序贯极限学习机的准确性。

J Otolaryngol Head Neck Surg. 2023 Sep 20;52(1):62. doi: 10.1186/s40463-023-00661-6.

Vocal Parameters and Self-Perception in Individuals With Adductor Spasmodic Dysphonia.内收型痉挛性发声障碍患者的嗓音参数与自我认知

J Voice. 2017 May;31(3):391.e7-391.e18. doi: 10.1016/j.jvoice.2016.09.029. Epub 2016 Nov 17.

引用本文的文献

Research on automatic assessment of the severity of unilateral vocal cord paralysis based on Mel-spectrogram and convolutional neural networks.基于梅尔频谱图和卷积神经网络的单侧声带麻痹严重程度自动评估研究

Biomed Eng Online. 2025 Jun 21;24(1):76. doi: 10.1186/s12938-025-01401-9.

Acoustic-based machine learning approaches for depression detection in Chinese university students.基于声学的机器学习方法用于中国大学生抑郁症检测

Front Public Health. 2025 May 15;13:1561332. doi: 10.3389/fpubh.2025.1561332. eCollection 2025.

Deep learning-based classification of speech disorder in stroke and hearing impairment.基于深度学习的中风和听力障碍语音障碍分类

PLoS One. 2025 May 28;20(5):e0315286. doi: 10.1371/journal.pone.0315286. eCollection 2025.

A Deep-Learning Model for Multi-class Audio Classification of Vocal Fold Pathologies in Office Stroboscopy.一种用于办公室频闪喉镜检查中声带病变多类别音频分类的深度学习模型。

Laryngoscope. 2025 Jul;135(7):2428-2436. doi: 10.1002/lary.32036. Epub 2025 Feb 5.

Laryngeal disease classification using voice data: Octave-band vs. mel-frequency filters.使用语音数据进行喉疾病分类：倍频程滤波器与梅尔频率滤波器

Heliyon. 2024 Nov 30;10(24):e40748. doi: 10.1016/j.heliyon.2024.e40748. eCollection 2024 Dec 30.

New developments in the application of artificial intelligence to laryngology.人工智能在喉科学中的应用新进展。

Curr Opin Otolaryngol Head Neck Surg. 2024 Dec 1;32(6):391-397. doi: 10.1097/MOO.0000000000000999. Epub 2024 Jul 24.

A machine learning approach for vocal fold segmentation and disorder classification based on ensemble method.基于集成方法的声带分割和障碍分类的机器学习方法。

Sci Rep. 2024 Jun 23;14(1):14435. doi: 10.1038/s41598-024-64987-5.

Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings.使用可解释机器学习和临床医生评级来识别从音频记录中检测声带麻痹的模型中的偏差。

PLOS Digit Health. 2024 May 30;3(5):e0000516. doi: 10.1371/journal.pdig.0000516. eCollection 2024 May.

Classification of laryngeal diseases including laryngeal cancer, benign mucosal disease, and vocal cord paralysis by artificial intelligence using voice analysis.利用语音分析通过人工智能对包括喉癌、良性黏膜疾病和声带麻痹在内的喉部疾病进行分类。

Sci Rep. 2024 Apr 23;14(1):9297. doi: 10.1038/s41598-024-58817-x.

Voice disorder recognition using machine learning: a scoping review protocol.基于机器学习的嗓音障碍识别：系统评价方案

BMJ Open. 2024 Feb 24;14(2):e076998. doi: 10.1136/bmjopen-2023-076998.

本文引用的文献

Emotional sounds of crowds: spectrogram-based analysis using deep learning.人群的情感声音：基于频谱图的深度学习分析

Multimed Tools Appl. 2020;79(47-48):36063-36075. doi: 10.1007/s11042-020-09428-x. Epub 2020 Aug 17.

Vocal Fold Paresis.声带麻痹

Otolaryngol Clin North Am. 2019 Aug;52(4):637-648. doi: 10.1016/j.otc.2019.03.008. Epub 2019 May 11.

Convolutional Neural Networks for Pathological Voice Detection.用于病理性语音检测的卷积神经网络

Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:1-4. doi: 10.1109/EMBC.2018.8513222.

Vibratory Characteristics of Diplophonia Studied by High Speed Video and Vibrogram Analysis.通过高速视频和振动图分析研究双音的振动特性

J Voice. 2019 Jan;33(1):7-15. doi: 10.1016/j.jvoice.2017.08.013. Epub 2018 Oct 30.

Office-Based Autologous Fat Injection Laryngoplasty for Glottic Insufficiency in Patients Under 50 Years Old.50 岁以下患者的基于诊室的自体脂肪注射喉成形术治疗声门不全。

J Voice. 2019 Sep;33(5):747-750. doi: 10.1016/j.jvoice.2018.03.012. Epub 2018 Apr 17.

Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach.基于倒谱向量的病理性嗓音检测：深度学习方法。

J Voice. 2019 Sep;33(5):634-641. doi: 10.1016/j.jvoice.2018.02.003. Epub 2018 Mar 19.

Clinical Practice Guideline: Hoarseness (Dysphonia) (Update).临床实践指南：声音嘶哑（发声障碍）（更新）。

Otolaryngol Head Neck Surg. 2018 Mar;158(1_suppl):S1-S42. doi: 10.1177/0194599817751030.

Narrow band imaging versus laryngovideostroboscopy in precancerous and malignant vocal fold lesions.窄带成像与喉动态镜检查在声带癌前病变和恶性病变中的应用比较

Head Neck. 2018 May;40(5):927-936. doi: 10.1002/hed.25047. Epub 2018 Jan 10.

Benign vocal fold lesions: update on nomenclature, cause, diagnosis, and treatment.良性声带病变：命名、病因、诊断及治疗的最新进展

Curr Opin Otolaryngol Head Neck Surg. 2017 Dec;25(6):453-458. doi: 10.1097/MOO.0000000000000408.

Spasmodic Dysphonia: A Review. Part 1: Pathogenic Factors.痉挛性发声障碍：综述。第1部分：致病因素。

Otolaryngol Head Neck Surg. 2017 Oct;157(4):551-557. doi: 10.1177/0194599817728521. Epub 2017 Aug 29.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

深度学习在声门疾病预测中的应用：通过语音识别——初步开发研究

Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study.

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献