Suppr超能文献

一种用于办公室频闪喉镜检查中声带病变多类别音频分类的深度学习模型。

A Deep-Learning Model for Multi-class Audio Classification of Vocal Fold Pathologies in Office Stroboscopy.

作者信息

Kim Yeo E, Dobko Maria, Li Haomiao, Shao Tianlan, Periyakoil Preethi, Tipton Courtney, Colasacco Christine, Serpedin Aisha, Elemento Olivier, Sabuncu Mert, Pitman Michael, Sulica Lucian, Rameau Anaïs

机构信息

Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, U.S.A.

Cornell Tech, New York, New York, U.S.A.

出版信息

Laryngoscope. 2025 Jul;135(7):2428-2436. doi: 10.1002/lary.32036. Epub 2025 Feb 5.

Abstract

OBJECTIVE

To develop and validate a deep-learning classifier trained on voice data extracted from videolaryngostroboscopy recordings, differentiating between three different vocal fold (VF) states: healthy (HVF), unilateral paralysis (UVFP), and VF lesions, including benign and malignant pathologies.

METHODS

Patients with UVFP (n = 105), VF lesions (n = 63), and HVF (n = 41) were retrospectively identified. Voice samples were extracted from stroboscopic videos (Pentax Laryngeal Strobe Model 9400), including sustained /i/ phonation, pitch glide, and /i/ sniff task. Extracted audio files were converted into Mel-spectrograms. Voice samples were independently divided into training (80%), validation (10%), and test (10%) by patient. Pretrained ResNet18 models were trained to classify (1) HVF and pathological VF (lesions and UVFP), and (2) HVF, UVFP, and VF lesions. Both classifiers were further validated on an external dataset consisting of 12 UVFP, 13 VF lesions, and 15 HVF patients. Model performances were evaluated by accuracy and F1-score.

RESULTS

When evaluated on a hold-out test set, the binary classifier demonstrated stronger performance compared to the multi-class classifier (accuracy 83% vs. 40%; F1-score 0.90 vs. 0.36). When evaluated on an external dataset, the binary classifier achieved an accuracy of 63% and F1-score of 0.48, compared to 35% and 0.25 for the multi-class classifier.

CONCLUSIONS

Deep-learning classifiers differentiating HVF, UVFP, and VF lesions were developed using voice data from stroboscopic videos. Although healthy and pathological voice were differentiated with moderate accuracy, multi-class classification lowered model performance. The model performed poorly on an external dataset. Voice captured in stroboscopic videos may have limited diagnostic value, though further studies are needed.

LEVEL OF EVIDENCE

4 Laryngoscope, 135:2428-2436, 2025.

摘要

目的

开发并验证一种基于从视频喉镜检查记录中提取的语音数据训练的深度学习分类器,以区分三种不同的声带(VF)状态:健康(HVF)、单侧麻痹(UVFP)和VF病变,包括良性和恶性病变。

方法

回顾性纳入UVFP患者(n = 105)、VF病变患者(n = 63)和HVF患者(n = 41)。从频闪视频(宾得喉频闪模型9400)中提取语音样本,包括持续发/i/音、音高滑动和/i/吸气任务。提取的音频文件转换为梅尔频谱图。语音样本按患者独立分为训练集(80%)、验证集(10%)和测试集(10%)。对预训练的ResNet18模型进行训练,以分类(1)HVF和病理性VF(病变和UVFP),以及(2)HVF、UVFP和VF病变。两个分类器均在由12例UVFP、13例VF病变和15例HVF患者组成的外部数据集上进一步验证。通过准确率和F1分数评估模型性能。

结果

在保留测试集上进行评估时,二元分类器的性能优于多分类器(准确率83%对40%;F1分数0.90对0.36)。在外部数据集上进行评估时,二元分类器的准确率为63%,F1分数为0.48,而多分类器分别为35%和0.25。

结论

利用频闪视频中的语音数据开发了区分HVF、UVFP和VF病变的深度学习分类器。尽管健康语音和病理性语音能够以中等准确率区分,但多分类降低了模型性能。该模型在外部数据集上表现不佳。频闪视频中捕获的语音可能具有有限的诊断价值,不过仍需进一步研究。

证据水平

4 喉镜,135:2428 - 2436,2025。

相似文献

1
A Deep-Learning Model for Multi-class Audio Classification of Vocal Fold Pathologies in Office Stroboscopy.
Laryngoscope. 2025 Jul;135(7):2428-2436. doi: 10.1002/lary.32036. Epub 2025 Feb 5.
5
Artificial intelligence for diagnosing exudative age-related macular degeneration.
Cochrane Database Syst Rev. 2024 Oct 17;10(10):CD015522. doi: 10.1002/14651858.CD015522.pub2.
7
A deep learning pipeline for automated classification of vocal fold polyps in flexible laryngoscopy.
Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2055-2062. doi: 10.1007/s00405-023-08190-8. Epub 2023 Sep 11.
8
Comparison between rigid telescopic and flexible fiberoptic laryngostroboscopy.
Braz J Otorhinolaryngol. 2025 Jul;91 Suppl 1(Suppl 1):101599. doi: 10.1016/j.bjorl.2025.101599. Epub 2025 Apr 9.
9
Integrating multi-source data for skin burn classification using deep learning.
Comput Biol Med. 2025 Sep;195:110556. doi: 10.1016/j.compbiomed.2025.110556. Epub 2025 Jun 24.
10
Role of Video-stroboscopy Vs Video-Laryngoscopy in Hoarseness of Voice.
Indian J Otolaryngol Head Neck Surg. 2025 Mar;77(3):1271-1275. doi: 10.1007/s12070-025-05319-5. Epub 2025 Jan 11.

本文引用的文献

1
Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings.
PLOS Digit Health. 2024 May 30;3(5):e0000516. doi: 10.1371/journal.pdig.0000516. eCollection 2024 May.
2
A Scoping Review of Artificial Intelligence Detection of Voice Pathology: Challenges and Opportunities.
Otolaryngol Head Neck Surg. 2024 Sep;171(3):658-666. doi: 10.1002/ohn.809. Epub 2024 May 13.
4
Voice as an AI Biomarker of Health-Introducing Audiomics.
JAMA Otolaryngol Head Neck Surg. 2024 Apr 1;150(4):283-284. doi: 10.1001/jamaoto.2023.4807.
5
The Effect of Noise on Deep Learning for Classification of Pathological Voice.
Laryngoscope. 2024 Aug;134(8):3537-3541. doi: 10.1002/lary.31303. Epub 2024 Jan 27.
6
Trust in Machine Learning Driven Clinical Decision Support Tools Among Otolaryngologists.
Laryngoscope. 2024 Jun;134(6):2799-2804. doi: 10.1002/lary.31260. Epub 2024 Jan 17.
8
9
Support of deep learning to classify vocal fold images in flexible laryngoscopy.
Am J Otolaryngol. 2023 May-Jun;44(3):103800. doi: 10.1016/j.amjoto.2023.103800. Epub 2023 Feb 24.
10
Multimodal machine learning in precision health: A scoping review.
NPJ Digit Med. 2022 Nov 7;5(1):171. doi: 10.1038/s41746-022-00712-8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验