Suppr超能文献

卷积神经网络模型在喉镜图像中声带正常性判断中的比较。

Comparison of Convolutional Neural Network Models for Determination of Vocal Fold Normality in Laryngoscopic Images.

机构信息

Departments of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.

Departments of Otorhinolaryngology-Head and Neck Surgery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, South Korea.

出版信息

J Voice. 2022 Sep;36(5):590-598. doi: 10.1016/j.jvoice.2020.08.003. Epub 2020 Aug 30.

Abstract

OBJECTIVES

Deep learning using convolutional neural networks (CNNs) is widely used in medical imaging research. This study was performed to investigate if vocal fold normality in laryngoscopic images can be determined by CNN-based deep learning and to compare accuracy of CNN models and explore the feasibility of application of deep learning on laryngoscopy.

METHODS

Laryngoscopy videos were screen-captured and each image was cropped to include abducted vocal fold regions. A total of 2216 image (899 normal, 1317 abnormal) were allocated to training, validation, and test sets. Augmentation of training sets was used to train a constructed CNN model with six layers (CNN6), VGG16, Inception V3, and Xception models. Trained models were applied to the test set; for each model, receiver operating characteristic curves and cutoff values were obtained. Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were calculated. The best model was employed in video-streams and localization of features was attempted using Grad-CAM.

RESULTS

All of the trained models showed high area under the receiver operating characteristic curve and the most discriminative cutoff levels of probability of normality were determined to be 35.6%, 61.8%, 13.5%, 39.7% for CNN6, VGG16, Inception V3, and Xception models, respectively. Accuracy of the CNN models selecting normal and abnormal vocal folds in the test set was 82.3%, 99.7%, 99.1%, and 83.8%, respectively.

CONCLUSION

All four models showed acceptable diagnostic accuracy. Performance of VGG16 and Inception V3 was better than the simple CNN6 model and the recently published Xception model. Real-time classification with a combination of the VGG16 model, OpenCV, and Grad-CAM on a video stream showed the potential clinical applications of the deep learning model in laryngoscopy.

摘要

目的

基于卷积神经网络(CNN)的深度学习在医学影像学研究中得到了广泛应用。本研究旨在探讨基于 CNN 的深度学习是否可以确定喉内镜图像中的声带正常情况,并比较 CNN 模型的准确性,探索深度学习在喉镜中的应用可行性。

方法

对喉内镜视频进行截屏,并裁剪每张图像以包括外展的声带区域。共有 2216 张图像(899 张正常,1317 张异常)被分配到训练集、验证集和测试集。通过扩充训练集来训练一个具有 6 个层的构建 CNN 模型(CNN6)、VGG16、Inception V3 和 Xception 模型。将训练好的模型应用于测试集;为每个模型获得接收者操作特征曲线和截断值。计算灵敏度、特异性、阳性预测值、阴性预测值和准确性。选择最佳模型应用于视频流,并尝试使用 Grad-CAM 进行特征定位。

结果

所有训练好的模型在接收者操作特征曲线下均显示出较高的面积,最具判别力的概率正常截断水平分别为 35.6%、61.8%、13.5%和 39.7%,对应于 CNN6、VGG16、Inception V3 和 Xception 模型。CNN 模型在测试集中选择正常和异常声带的准确率分别为 82.3%、99.7%、99.1%和 83.8%。

结论

所有四个模型均表现出可接受的诊断准确性。VGG16 和 Inception V3 的性能优于简单的 CNN6 模型和最近发表的 Xception 模型。在视频流上结合 VGG16 模型、OpenCV 和 Grad-CAM 进行实时分类,显示了深度学习模型在喉镜中的潜在临床应用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验