Nguyen Huu Phong, Khairnar Shekhar Madhav, Palacios Sofia Garces, Al-Abbas Amr, Hogg Melissa E, Zureikat Amer H, Polanco Patricio M, Zeh Herbert J, Sankaranarayanan Ganesh
Department of Surgery, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
Department of Surgery, NorthShore University HealthSystem, Evanston, IL 60201, USA.
IEEE Access. 2025;13:101681-101697. doi: 10.1109/access.2025.3573264. Epub 2025 May 23.
The interest in leveraging Artificial Intelligence (AI) for surgical procedures to automate analysis has witnessed a significant surge in recent years. One of the primary tools for recording surgical procedures and conducting subsequent analyses, such as performance assessment, is through videos. However, these operative videos tend to be notably lengthy compared to other fields, spanning from thirty minutes to several hours, which poses a challenge for AI models to effectively learn from them. Despite this challenge, the foreseeable increase in the volume of such videos in the near future necessitates the development and implementation of innovative techniques to tackle this issue effectively. In this article, we propose a novel technique called Kinematics Adaptive Frame Recognition (KAFR) that can efficiently eliminate redundant frames to reduce dataset size and computation time while retaining useful frames to improve accuracy. Specifically, we compute the similarity between consecutive frames by tracking the movement of surgical tools. Our approach follows these steps:1) Tracking phase: a YOLOv8 model is utilized to detect tools presented in the scene, 2) Similarity phase: Similarities between consecutive frames are computed by estimating variation in the spatial positions and velocities of the tools, 3) Classification phase: An X3D CNN is trained to classify segmentation. We evaluate the effectiveness of our approach by analyzing datasets obtained through retrospective reviews of cases at two referral centers. The newly annotated Gastrojejunostomy (GJ) dataset covers procedures performed between 2017 and 2021, while the previously annotated Pancreaticojejunostomy (PJ) dataset spans from 2011 to 2022 at the same centers. In the GJ dataset, each robotic GJ video is segmented into six distinct phases. By adaptively selecting relevant frames, we achieve a reduction in the number of frames while improving by 4.32% (from 0.749 to 0.7814) and the F1 score by 0.16%. Our approach is also evaluated on the PJ dataset, demonstrating its efficacy with a fivefold reduction of data and a 2.05% accuracy improvement (from 0.8801 to 0.8982), along with 2.54% increase in F1 score (from 0.8534 to 0.8751). In addition, we also compare our approach with the state-of-the-art approaches to highlight its competitiveness in terms of performance and efficiency. Although we examined our approach on the GJ and PJ datasets for phase segmentation, this could also be applied to broader, more general surgical datasets. Furthermore, KAFR can serve as a supplement to existing approaches, enhancing their performance by reducing redundant data while retaining key information, making it a valuable addition to other AI models.
近年来,利用人工智能(AI)实现手术过程自动化分析的兴趣显著激增。记录手术过程并进行后续分析(如性能评估)的主要工具之一是视频。然而,与其他领域相比,这些手术视频往往长得多,从三十分钟到几个小时不等,这给人工智能模型从中有效学习带来了挑战。尽管存在这一挑战,但鉴于近期此类视频数量预计会增加,有必要开发并实施创新技术来有效解决这一问题。在本文中,我们提出了一种名为运动学自适应帧识别(KAFR)的新技术,该技术可以有效消除冗余帧,以减少数据集大小和计算时间,同时保留有用帧以提高准确性。具体而言,我们通过跟踪手术工具的移动来计算连续帧之间的相似度。我们的方法遵循以下步骤:1)跟踪阶段:利用YOLOv8模型检测场景中出现的工具;2)相似度阶段:通过估计工具空间位置和速度的变化来计算连续帧之间的相似度;3)分类阶段:训练一个X3D卷积神经网络(CNN)进行分割分类。我们通过分析从两个转诊中心的病例回顾性研究中获得的数据集来评估我们方法的有效性。新注释的胃空肠吻合术(GJ)数据集涵盖了2017年至2021年期间进行的手术,而之前注释的胰空肠吻合术(PJ)数据集则涵盖了同一中心2011年至2022年期间的手术。在GJ数据集中,每个机器人辅助GJ视频被分割为六个不同阶段。通过自适应选择相关帧,我们在减少帧数的同时,准确率提高了4.32%(从0.749提高到0.7814),F1分数提高了0.16%。我们的方法也在PJ数据集上进行了评估,结果表明其有效性,数据量减少了五倍,准确率提高了2.05%(从0.8801提高到0.8982),F1分数提高了2.54%(从0.8534提高到0.8751)。此外,我们还将我们的方法与现有最先进的方法进行比较,以突出其在性能和效率方面的竞争力。尽管我们在GJ和PJ数据集上对我们的方法进行了阶段分割测试,但该方法也可应用于更广泛、更通用的手术数据集。此外,KAFR可以作为现有方法的补充,通过减少冗余数据同时保留关键信息来提高其性能,使其成为其他人工智能模型的宝贵补充。