Suppr超能文献

使用深度卷积神经网络的迁移学习在未修剪视频荧光吞咽研究中进行自动咽期识别

Automatic Pharyngeal Phase Recognition in Untrimmed Videofluoroscopic Swallowing Study Using Transfer Learning with Deep Convolutional Neural Networks.

作者信息

Lee Ki-Sun, Lee Eunyoung, Choi Bareun, Pyun Sung-Bom

机构信息

Medical Science Research Center, Ansan Hospital, Korea University College of Medicine, Ansan-si 15355, Korea.

Department of Physical Medicine and Rehabilitation, Anam Hospital, Korea University College of Medicine, Seoul 02841, Korea.

出版信息

Diagnostics (Basel). 2021 Feb 13;11(2):300. doi: 10.3390/diagnostics11020300.

Abstract

BACKGROUND

Video fluoroscopic swallowing study (VFSS) is considered as the gold standard diagnostic tool for evaluating dysphagia. However, it is time consuming and labor intensive for the clinician to manually search the recorded long video image frame by frame to identify the instantaneous swallowing abnormality in VFSS images. Therefore, this study aims to present a deep leaning-based approach using transfer learning with a convolutional neural network (CNN) that automatically annotates pharyngeal phase frames in untrimmed VFSS videos such that frames need not be searched manually.

METHODS

To determine whether the image frame in the VFSS video is in the pharyngeal phase, a single-frame baseline architecture based the deep CNN framework is used and a transfer learning technique with fine-tuning is applied.

RESULTS

Compared with all experimental CNN models, that fine-tuned with two blocks of the VGG-16 (VGG16-FT5) model achieved the highest performance in terms of recognizing the frame of pharyngeal phase, that is, the accuracy of 93.20 (±1.25)%, sensitivity of 84.57 (±5.19)%, specificity of 94.36 (±1.21)%, AUC of 0.8947 (±0.0269) and Kappa of 0.7093 (±0.0488).

CONCLUSIONS

Using appropriate and fine-tuning techniques and explainable deep learning techniques such as grad CAM, this study shows that the proposed single-frame-baseline-architecture-based deep CNN framework can yield high performances in the full automation of VFSS video analysis.

摘要

背景

视频荧光吞咽造影研究(VFSS)被认为是评估吞咽困难的金标准诊断工具。然而,临床医生逐帧手动搜索记录的长视频图像以识别VFSS图像中的瞬时吞咽异常既耗时又费力。因此,本研究旨在提出一种基于深度学习的方法,使用卷积神经网络(CNN)进行迁移学习,自动标注未剪辑的VFSS视频中的咽期帧,从而无需手动搜索帧。

方法

为了确定VFSS视频中的图像帧是否处于咽期,使用了基于深度CNN框架的单帧基线架构,并应用了带有微调的迁移学习技术。

结果

与所有实验性CNN模型相比,用VGG-16的两个模块进行微调的模型(VGG16-FT5)在识别咽期帧方面表现最佳,即准确率为93.20(±1.25)%,灵敏度为84.57(±5.19)%,特异性为94.36(±1.21)%,曲线下面积(AUC)为0.8947(±0.0269),kappa值为0.7093(±0.0488)。

结论

本研究表明,使用适当的微调技术和诸如梯度加权类激活映射(grad CAM)等可解释的深度学习技术,所提出的基于单帧基线架构的深度CNN框架在VFSS视频分析的完全自动化方面可产生高性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8717/7918932/7f51d93546a1/diagnostics-11-00300-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验