Ju Shujun, Jiang Penglin, Jin Yutong, Fu Yaoyu, Wang Xiandi, Tan Xiaomei, Han Ying, Yin Rong, Pu Dan, Li Kang
West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.
Department of Industrial Engineering, Sichuan University, Chengdu, China.
Surg Endosc. 2025 Jun;39(6):3749-3759. doi: 10.1007/s00464-025-11730-4. Epub 2025 May 2.
Laparoscopic surgery training is gaining increasing importance. To release doctors from the burden of manually annotating videos, we proposed an automatic surgical gesture recognition model based on the Fundamentals of Laparoscopic Surgery (FLS) and the Chinese Laparoscopic Skills Testing and Assessment (CLSTA) tools. Furthermore, statistical analysis was conducted based on a gesture vocabulary that had been designed to examine differences between groups at different levels.
Based on the CLSTA, the training process of peg transfer can be represented by a standard sequence of seven surgical gestures defined in our gesture vocabulary. The dataset used for model training and testing included eighty videos recorded at 30 fps. All videos were rated by senior medical professionals from our medical training center. The dataset was processed using cross-validation to ensure robust model performance. The model applied is 3D ResNet-18, a convolutional neural network (CNN). An LSTM neural network was utilized to refine the output sequence.
The overall accuracy for the recognition model was 83.8% and the F1 score was 84%. The LSTM network improved model performance to 85.84% accuracy and an 85% F1 score. Every operative process starts with Gesture 1 (G1) and ends with G5, with wrong placement is labeled as G6. The average training time is 92 s (SD = 36). Variance was observed between groups for G1, G3, and G6, indicating that trainees may benefit from focusing their efforts on these relevant operations, while assisting doctors also in more effectively analyzing the training outcome.
An automatic surgical gesture recognition model was developed for the peg transfer task. We also defined a gesture vocabulary along with the artificial intelligence model to sequentially describe the training operation. This provides an opportunity for artificial intelligence-enabled objective and automatic evaluation based on CLSTA in the clinic implementation.
腹腔镜手术培训正变得越来越重要。为了减轻医生手动标注视频的负担,我们提出了一种基于腹腔镜手术基础(FLS)和中国腹腔镜技能测试与评估(CLSTA)工具的自动手术手势识别模型。此外,基于一个旨在检查不同水平组间差异而设计的手势词汇表进行了统计分析。
基于CLSTA,移钉训练过程可以由我们手势词汇表中定义的七个手术手势的标准序列来表示。用于模型训练和测试的数据集包括以30帧每秒录制的80个视频。所有视频均由我们医学培训中心的资深医学专业人员进行评分。使用交叉验证对数据集进行处理,以确保模型性能稳健。应用的模型是3D ResNet-18,一种卷积神经网络(CNN)。利用长短期记忆(LSTM)神经网络来优化输出序列。
识别模型的总体准确率为83.8%,F1分数为84%。LSTM网络将模型性能提高到准确率85.84%,F1分数85%。每个手术过程都从手势一(G1)开始,以G5结束,放置错误则标记为G6。平均训练时间为92秒(标准差=36)。在G1、G3和G6组间观察到差异,这表明学员可能会从专注于这些相关操作中受益,同时也有助于医生更有效地分析训练结果。
针对移钉任务开发了一种自动手术手势识别模型。我们还定义了一个手势词汇表以及人工智能模型,以顺序描述训练操作。这为在临床实施中基于CLSTA进行人工智能驱动的客观和自动评估提供了机会。