Kwak Soyoung, Kim Jeoung Kun, Moon Jun Sung, Lee Gun Woo, Kim Sungho, Chang Min Cheol
Department of Physical Medicine and Rehabilitation, College of Medicine, Yeungnam University, Daegu, Republic of Korea.
Department of Business Administration, School of Business, Yeungnam University, Gyeongsan-si, Republic of Korea.
Sci Rep. 2025 Jul 7;15(1):24296. doi: 10.1038/s41598-025-10397-0.
The videofluoroscopic swallowing study (VFSS) is the gold standard for diagnosing dysphagia, but its interpretation is time-consuming and requires expertise. This study developed a deep learning model for automatically detecting penetration and aspiration in VFSS and assessed its diagnostic accuracy. Images corresponding to the highest and lowest positions of the hyoid bone -representing the moment of upper esophageal sphincter opening during swallow and the pre-swallow and post-swallow phases, respectively- were automatically extracted from VFSS videos, resulting in a total of 18,145 images from 1,467 patients. The model was trained with a convolutional neural network architecture, incorporating techniques to address class imbalance and optimize performance. The model achieved high diagnostic accuracy at the patient level, with the area under the receiver operating characteristic curve values of 0.935 (normal swallowing), 0.889 (penetration), and 0.845 (aspiration). However, despite strong performance in identifying normal swallowing, the model exhibited low sensitivity for detecting penetration and aspiration. The findings suggest that the proposed model may reduce interpretation time by minimizing the need for repeated video review to identify penetration or aspiration, enabling clinicians to focus on other clinically relevant VFSS findings. Future studies should address its limitations by analyzing full-frame VFSS data and incorporating multicenter datasets.
视频荧光吞咽造影检查(VFSS)是诊断吞咽困难的金标准,但其解读耗时且需要专业知识。本研究开发了一种深度学习模型,用于自动检测VFSS中的误吸和渗透,并评估其诊断准确性。分别代表吞咽过程中食管上括约肌开放时刻以及吞咽前和吞咽后阶段的舌骨最高和最低位置对应的图像,从VFSS视频中自动提取,共得到来自1467例患者的18145张图像。该模型采用卷积神经网络架构进行训练,纳入了解决类别不平衡和优化性能的技术。该模型在患者层面实现了较高的诊断准确性,受试者操作特征曲线下面积值分别为0.935(正常吞咽)、0.889(渗透)和0.845(误吸)。然而,尽管在识别正常吞咽方面表现出色,但该模型在检测渗透和误吸方面的敏感性较低。研究结果表明,所提出的模型可能通过减少反复查看视频以识别渗透或误吸的需求来缩短解读时间,使临床医生能够专注于其他与VFSS相关的临床发现。未来的研究应通过分析全帧VFSS数据并纳入多中心数据集来解决其局限性。