• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于办公室频闪喉镜检查中声带病变多类别音频分类的深度学习模型。

A Deep-Learning Model for Multi-class Audio Classification of Vocal Fold Pathologies in Office Stroboscopy.

作者信息

Kim Yeo E, Dobko Maria, Li Haomiao, Shao Tianlan, Periyakoil Preethi, Tipton Courtney, Colasacco Christine, Serpedin Aisha, Elemento Olivier, Sabuncu Mert, Pitman Michael, Sulica Lucian, Rameau Anaïs

机构信息

Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, U.S.A.

Cornell Tech, New York, New York, U.S.A.

出版信息

Laryngoscope. 2025 Jul;135(7):2428-2436. doi: 10.1002/lary.32036. Epub 2025 Feb 5.

DOI:10.1002/lary.32036
PMID:39907244
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12234251/
Abstract

OBJECTIVE

To develop and validate a deep-learning classifier trained on voice data extracted from videolaryngostroboscopy recordings, differentiating between three different vocal fold (VF) states: healthy (HVF), unilateral paralysis (UVFP), and VF lesions, including benign and malignant pathologies.

METHODS

Patients with UVFP (n = 105), VF lesions (n = 63), and HVF (n = 41) were retrospectively identified. Voice samples were extracted from stroboscopic videos (Pentax Laryngeal Strobe Model 9400), including sustained /i/ phonation, pitch glide, and /i/ sniff task. Extracted audio files were converted into Mel-spectrograms. Voice samples were independently divided into training (80%), validation (10%), and test (10%) by patient. Pretrained ResNet18 models were trained to classify (1) HVF and pathological VF (lesions and UVFP), and (2) HVF, UVFP, and VF lesions. Both classifiers were further validated on an external dataset consisting of 12 UVFP, 13 VF lesions, and 15 HVF patients. Model performances were evaluated by accuracy and F1-score.

RESULTS

When evaluated on a hold-out test set, the binary classifier demonstrated stronger performance compared to the multi-class classifier (accuracy 83% vs. 40%; F1-score 0.90 vs. 0.36). When evaluated on an external dataset, the binary classifier achieved an accuracy of 63% and F1-score of 0.48, compared to 35% and 0.25 for the multi-class classifier.

CONCLUSIONS

Deep-learning classifiers differentiating HVF, UVFP, and VF lesions were developed using voice data from stroboscopic videos. Although healthy and pathological voice were differentiated with moderate accuracy, multi-class classification lowered model performance. The model performed poorly on an external dataset. Voice captured in stroboscopic videos may have limited diagnostic value, though further studies are needed.

LEVEL OF EVIDENCE

4 Laryngoscope, 135:2428-2436, 2025.

摘要

目的

开发并验证一种基于从视频喉镜检查记录中提取的语音数据训练的深度学习分类器,以区分三种不同的声带(VF)状态:健康(HVF)、单侧麻痹(UVFP)和VF病变,包括良性和恶性病变。

方法

回顾性纳入UVFP患者(n = 105)、VF病变患者(n = 63)和HVF患者(n = 41)。从频闪视频(宾得喉频闪模型9400)中提取语音样本,包括持续发/i/音、音高滑动和/i/吸气任务。提取的音频文件转换为梅尔频谱图。语音样本按患者独立分为训练集(80%)、验证集(10%)和测试集(10%)。对预训练的ResNet18模型进行训练,以分类(1)HVF和病理性VF(病变和UVFP),以及(2)HVF、UVFP和VF病变。两个分类器均在由12例UVFP、13例VF病变和15例HVF患者组成的外部数据集上进一步验证。通过准确率和F1分数评估模型性能。

结果

在保留测试集上进行评估时,二元分类器的性能优于多分类器(准确率83%对40%;F1分数0.90对0.36)。在外部数据集上进行评估时,二元分类器的准确率为63%,F1分数为0.48,而多分类器分别为35%和0.25。

结论

利用频闪视频中的语音数据开发了区分HVF、UVFP和VF病变的深度学习分类器。尽管健康语音和病理性语音能够以中等准确率区分,但多分类降低了模型性能。该模型在外部数据集上表现不佳。频闪视频中捕获的语音可能具有有限的诊断价值,不过仍需进一步研究。

证据水平

4 喉镜,135:2428 - 2436,2025。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7721/12234251/6374dd42150f/nihms-2051985-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7721/12234251/6374dd42150f/nihms-2051985-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7721/12234251/6374dd42150f/nihms-2051985-f0001.jpg

相似文献

1
A Deep-Learning Model for Multi-class Audio Classification of Vocal Fold Pathologies in Office Stroboscopy.一种用于办公室频闪喉镜检查中声带病变多类别音频分类的深度学习模型。
Laryngoscope. 2025 Jul;135(7):2428-2436. doi: 10.1002/lary.32036. Epub 2025 Feb 5.
2
Facial Emotion Recognition of 16 Distinct Emotions From Smartphone Videos: Comparative Study of Machine Learning and Human Performance.基于智能手机视频的16种不同情绪的面部表情识别:机器学习与人类表现的对比研究
J Med Internet Res. 2025 Jul 2;27:e68942. doi: 10.2196/68942.
3
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
4
Research on automatic assessment of the severity of unilateral vocal cord paralysis based on Mel-spectrogram and convolutional neural networks.基于梅尔频谱图和卷积神经网络的单侧声带麻痹严重程度自动评估研究
Biomed Eng Online. 2025 Jun 21;24(1):76. doi: 10.1186/s12938-025-01401-9.
5
Artificial intelligence for diagnosing exudative age-related macular degeneration.人工智能在渗出性年龄相关性黄斑变性诊断中的应用。
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
6
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
7
A deep learning pipeline for automated classification of vocal fold polyps in flexible laryngoscopy.一种用于在软性喉镜检查中自动分类声带息肉的深度学习流程。
Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2055-2062. doi: 10.1007/s00405-023-08190-8. Epub 2023 Sep 11.
8
Comparison between rigid telescopic and flexible fiberoptic laryngostroboscopy.硬性可伸缩喉镜与软性纤维喉镜频闪喉镜检查的比较。
Braz J Otorhinolaryngol. 2025 Jul;91 Suppl 1(Suppl 1):101599. doi: 10.1016/j.bjorl.2025.101599. Epub 2025 Apr 9.
9
Integrating multi-source data for skin burn classification using deep learning.利用深度学习整合多源数据进行皮肤烧伤分类
Comput Biol Med. 2025 Sep;195:110556. doi: 10.1016/j.compbiomed.2025.110556. Epub 2025 Jun 24.
10
Role of Video-stroboscopy Vs Video-Laryngoscopy in Hoarseness of Voice.动态喉镜检查与电子喉镜检查在声音嘶哑中的作用
Indian J Otolaryngol Head Neck Surg. 2025 Mar;77(3):1271-1275. doi: 10.1007/s12070-025-05319-5. Epub 2025 Jan 11.

本文引用的文献

1
Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings.使用可解释机器学习和临床医生评级来识别从音频记录中检测声带麻痹的模型中的偏差。
PLOS Digit Health. 2024 May 30;3(5):e0000516. doi: 10.1371/journal.pdig.0000516. eCollection 2024 May.
2
A Scoping Review of Artificial Intelligence Detection of Voice Pathology: Challenges and Opportunities.人工智能检测语音病理学的范围综述:挑战与机遇
Otolaryngol Head Neck Surg. 2024 Sep;171(3):658-666. doi: 10.1002/ohn.809. Epub 2024 May 13.
3
Leveraging Deep Learning for Fine-Grained Categorization of Parkinson's Disease Progression Levels through Analysis of Vocal Acoustic Patterns.
通过分析语音声学模式,利用深度学习对帕金森病进展水平进行细粒度分类。
Bioengineering (Basel). 2024 Mar 21;11(3):295. doi: 10.3390/bioengineering11030295.
4
Voice as an AI Biomarker of Health-Introducing Audiomics.语音作为健康的人工智能生物标志物——引入听觉组学。
JAMA Otolaryngol Head Neck Surg. 2024 Apr 1;150(4):283-284. doi: 10.1001/jamaoto.2023.4807.
5
The Effect of Noise on Deep Learning for Classification of Pathological Voice.噪声对病理性语音分类的深度学习的影响。
Laryngoscope. 2024 Aug;134(8):3537-3541. doi: 10.1002/lary.31303. Epub 2024 Jan 27.
6
Trust in Machine Learning Driven Clinical Decision Support Tools Among Otolaryngologists.耳鼻喉科医生对机器学习驱动的临床决策支持工具的信任度。
Laryngoscope. 2024 Jun;134(6):2799-2804. doi: 10.1002/lary.31260. Epub 2024 Jan 17.
7
Variability of Maximum Glottal Angle on Clinical Sniff Task Differs in Patients With Functional and Organic Laryngeal Pathologies Compared to Healthy Controls.
J Voice. 2024 Jan 8. doi: 10.1016/j.jvoice.2023.12.007.
8
Voice disorder classification using convolutional neural network based on deep transfer learning.基于深度迁移学习的卷积神经网络语音障碍分类。
Sci Rep. 2023 May 4;13(1):7264. doi: 10.1038/s41598-023-34461-9.
9
Support of deep learning to classify vocal fold images in flexible laryngoscopy.深度学习对柔性喉镜检查中声带图像进行分类的支持。
Am J Otolaryngol. 2023 May-Jun;44(3):103800. doi: 10.1016/j.amjoto.2023.103800. Epub 2023 Feb 24.
10
Multimodal machine learning in precision health: A scoping review.精准健康中的多模态机器学习:一项范围综述。
NPJ Digit Med. 2022 Nov 7;5(1):171. doi: 10.1038/s41746-022-00712-8.