卷积神经网络模型在喉镜图像中声带正常性判断中的比较。

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images.

机构信息

Departments of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.

出版信息

J Voice. 2022 Sep;36(5):590-598. doi: 10.1016/j.jvoice.2020.08.003. Epub 2020 Aug 30.

DOI:10.1016/j.jvoice.2020.08.003

PMID:32873430

Abstract

OBJECTIVES

Deep learning using convolutional neural networks (CNNs) is widely used in medical imaging research. This study was performed to investigate if vocal fold normality in laryngoscopic images can be determined by CNN-based deep learning and to compare accuracy of CNN models and explore the feasibility of application of deep learning on laryngoscopy.

METHODS

Laryngoscopy videos were screen-captured and each image was cropped to include abducted vocal fold regions. A total of 2216 image (899 normal, 1317 abnormal) were allocated to training, validation, and test sets. Augmentation of training sets was used to train a constructed CNN model with six layers (CNN6), VGG16, Inception V3, and Xception models. Trained models were applied to the test set; for each model, receiver operating characteristic curves and cutoff values were obtained. Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were calculated. The best model was employed in video-streams and localization of features was attempted using Grad-CAM.

RESULTS

All of the trained models showed high area under the receiver operating characteristic curve and the most discriminative cutoff levels of probability of normality were determined to be 35.6%, 61.8%, 13.5%, 39.7% for CNN6, VGG16, Inception V3, and Xception models, respectively. Accuracy of the CNN models selecting normal and abnormal vocal folds in the test set was 82.3%, 99.7%, 99.1%, and 83.8%, respectively.

CONCLUSION

All four models showed acceptable diagnostic accuracy. Performance of VGG16 and Inception V3 was better than the simple CNN6 model and the recently published Xception model. Real-time classification with a combination of the VGG16 model, OpenCV, and Grad-CAM on a video stream showed the potential clinical applications of the deep learning model in laryngoscopy.

摘要

目的

基于卷积神经网络（CNN）的深度学习在医学影像学研究中得到了广泛应用。本研究旨在探讨基于 CNN 的深度学习是否可以确定喉内镜图像中的声带正常情况，并比较 CNN 模型的准确性，探索深度学习在喉镜中的应用可行性。

方法

对喉内镜视频进行截屏，并裁剪每张图像以包括外展的声带区域。共有 2216 张图像（899 张正常，1317 张异常）被分配到训练集、验证集和测试集。通过扩充训练集来训练一个具有 6 个层的构建 CNN 模型（CNN6）、VGG16、Inception V3 和 Xception 模型。将训练好的模型应用于测试集；为每个模型获得接收者操作特征曲线和截断值。计算灵敏度、特异性、阳性预测值、阴性预测值和准确性。选择最佳模型应用于视频流，并尝试使用 Grad-CAM 进行特征定位。

结果

所有训练好的模型在接收者操作特征曲线下均显示出较高的面积，最具判别力的概率正常截断水平分别为 35.6%、61.8%、13.5%和 39.7%，对应于 CNN6、VGG16、Inception V3 和 Xception 模型。CNN 模型在测试集中选择正常和异常声带的准确率分别为 82.3%、99.7%、99.1%和 83.8%。

结论

所有四个模型均表现出可接受的诊断准确性。VGG16 和 Inception V3 的性能优于简单的 CNN6 模型和最近发表的 Xception 模型。在视频流上结合 VGG16 模型、OpenCV 和 Grad-CAM 进行实时分类，显示了深度学习模型在喉镜中的潜在临床应用。

相似文献

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images.

J Voice. 2022 Sep;36(5):590-598. doi: 10.1016/j.jvoice.2020.08.003. Epub 2020 Aug 30.

Comparison of convolutional neural networks for classification of vocal fold nodules from high-speed video images.

Eur Arch Otorhinolaryngol. 2023 May;280(5):2365-2371. doi: 10.1007/s00405-022-07736-6. Epub 2022 Nov 11.

A Convolutional Neural Network for Real Time Classification, Identification, and Labelling of Vocal Cord and Tracheal Using Laryngoscopy and Bronchoscopy Video.

J Med Syst. 2020 Jan 2;44(2):44. doi: 10.1007/s10916-019-1481-4.

Support of deep learning to classify vocal fold images in flexible laryngoscopy.

Am J Otolaryngol. 2023 May-Jun;44(3):103800. doi: 10.1016/j.amjoto.2023.103800. Epub 2023 Feb 24.

A deep learning pipeline for automated classification of vocal fold polyps in flexible laryngoscopy.

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2055-2062. doi: 10.1007/s00405-023-08190-8. Epub 2023 Sep 11.

MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling.

J Digit Imaging. 2018 Aug;31(4):513-519. doi: 10.1007/s10278-018-0053-3.

Development and Validation of a Convolutional Neural Network Model to Predict a Pathologic Fracture in the Proximal Femur Using Abdomen and Pelvis CT Images of Patients With Advanced Cancer.

Clin Orthop Relat Res. 2023 Nov 1;481(11):2247-2256. doi: 10.1097/CORR.0000000000002771. Epub 2023 Aug 23.

[Automatic anatomical site recognition of laryngoscopic images using convolutional neural network].

Lin Chuang Er Bi Yan Hou Tou Jing Wai Ke Za Zhi. 2023 Jan;37(1):6-12. doi: 10.13201/j.issn.2096-7993.2023.01.002.

Convolutional neural network based anatomical site identification for laryngoscopy quality control: A multicenter study.

Am J Otolaryngol. 2023 Mar-Apr;44(2):103695. doi: 10.1016/j.amjoto.2022.103695. Epub 2022 Nov 24.

Vocal cord lesions classification based on deep convolutional neural network and transfer learning.

Med Phys. 2022 Jan;49(1):432-442. doi: 10.1002/mp.15371. Epub 2021 Dec 8.

引用本文的文献

A Deep-Learning Model for Multi-class Audio Classification of Vocal Fold Pathologies in Office Stroboscopy.

Laryngoscope. 2025 Jul;135(7):2428-2436. doi: 10.1002/lary.32036. Epub 2025 Feb 5.

Artificial Intelligence in Otology, Rhinology, and Laryngology: A Narrative Review of Its Current and Evolving Picture.

Cureus. 2024 Aug 2;16(8):e66036. doi: 10.7759/cureus.66036. eCollection 2024 Aug.

New developments in the application of artificial intelligence to laryngology.

Curr Opin Otolaryngol Head Neck Surg. 2024 Dec 1;32(6):391-397. doi: 10.1097/MOO.0000000000000999. Epub 2024 Jul 24.

Improving Laryngoscopy Image Analysis Through Integration of Global Information and Local Features in VoFoCD Dataset.

J Imaging Inform Med. 2024 Dec;37(6):2794-2809. doi: 10.1007/s10278-024-01068-z. Epub 2024 May 29.

Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting.

J Speech Lang Hear Res. 2024 Mar 11;67(3):753-781. doi: 10.1044/2023_JSLHR-23-00273. Epub 2024 Feb 22.

Computer-Aided Diagnosis of Laryngeal Cancer Based on Deep Learning with Laryngoscopic Images.

Diagnostics (Basel). 2023 Dec 14;13(24):3669. doi: 10.3390/diagnostics13243669.

Deep Learning Techniques and Imaging in Otorhinolaryngology-A State-of-the-Art Review.

J Clin Med. 2023 Nov 8;12(22):6973. doi: 10.3390/jcm12226973.

Convolutional neural network-based vocal cord tumor classification technique for home-based self-prescreening purpose.

Biomed Eng Online. 2023 Aug 18;22(1):81. doi: 10.1186/s12938-023-01139-2.

A Novel Framework of Manifold Learning Cascade-Clustering for the Informative Frame Selection.

Diagnostics (Basel). 2023 Mar 17;13(6):1151. doi: 10.3390/diagnostics13061151.

3D VOSNet: Segmentation of endoscopic images of the larynx with subsequent generation of indicators.

Heliyon. 2023 Mar 3;9(3):e14242. doi: 10.1016/j.heliyon.2023.e14242. eCollection 2023 Mar.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

卷积神经网络模型在喉镜图像中声带正常性判断中的比较。

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images.

机构信息

出版信息

OBJECTIVES

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献