用于手术视频工作流识别的基于时间的Swin Transformer网络

Temporal-based Swin Transformer network for workflow recognition of surgical video.

作者信息

Pan Xiaoying, Gao Xuanrong, Wang Hongyu, Zhang Wuxia, Mu Yuanzhen, He Xianli

机构信息

School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, GuoDu, Xi'an, 710121, Shaanxi, China.

Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China.

出版信息

Int J Comput Assist Radiol Surg. 2023 Jan;18(1):139-147. doi: 10.1007/s11548-022-02785-y. Epub 2022 Nov 4.

DOI:10.1007/s11548-022-02785-y

PMID:36331795

Abstract

PURPOSE

Surgical workflow recognition has emerged as an important part of computer-assisted intervention systems for the modern operating room, which also is a very challenging problem. Although the CNN-based approach achieves excellent performance, it does not learn global and long-range semantic information interactions well due to the inductive bias inherent in convolution.

METHODS

In this paper, we propose a temporal-based Swin Transformer network (TSTNet) for the surgical video workflow recognition task. TSTNet contains two main parts: the Swin Transformer and the LSTM. The Swin Transformer incorporates the attention mechanism to encode remote dependencies and learn highly expressive representations. The LSTM is capable of learning long-range dependencies and is used to extract temporal information. The TSTNet organically combines the two components to extract spatiotemporal features that contain more contextual information. In particular, based on a full understanding of the natural features of the surgical video, we propose a priori revision algorithm (PRA) using a priori information about the sequence of the surgical phase. This strategy optimizes the output of TSTNet and further improves the recognition performance.

RESULTS

We conduct extensive experiments using the Cholec80 dataset to validate the effectiveness of the TSTNet-PRA method. Our method achieves excellent performance on the Cholec80 dataset, which accuracy is up to 92.8% and greatly exceeds the state-of-the-art methods.

CONCLUSION

By modelling remote temporal information and multi-scale visual information, we propose the TSTNet-PRA method. It was evaluated on a large public dataset, showing a high recognition capability superior to other spatiotemporal networks.

摘要

目的

手术工作流程识别已成为现代手术室计算机辅助干预系统的重要组成部分，这也是一个极具挑战性的问题。尽管基于卷积神经网络（CNN）的方法取得了优异的性能，但由于卷积固有的归纳偏差，它不能很好地学习全局和远程语义信息交互。

方法

在本文中，我们提出了一种基于时间的Swin Transformer网络（TSTNet）用于手术视频工作流程识别任务。TSTNet包含两个主要部分：Swin Transformer和长短期记忆网络（LSTM）。Swin Transformer结合注意力机制来编码远程依赖并学习高表达性的表示。LSTM能够学习长程依赖并用于提取时间信息。TSTNet将这两个组件有机结合以提取包含更多上下文信息的时空特征。特别是，在充分理解手术视频自然特征的基础上，我们提出了一种使用手术阶段序列先验信息的先验修正算法（PRA）。该策略优化了TSTNet的输出并进一步提高了识别性能。

结果

我们使用Cholec80数据集进行了广泛的实验，以验证TSTNet - PRA方法的有效性。我们的方法在Cholec80数据集上取得了优异的性能，准确率高达92.8%，大大超过了现有最先进的方法。

结论

通过对远程时间信息和多尺度视觉信息进行建模，我们提出了TSTNet - PRA方法。它在一个大型公共数据集上进行了评估，显示出优于其他时空网络的高识别能力。

相似文献

Temporal-based Swin Transformer network for workflow recognition of surgical video.用于手术视频工作流识别的基于时间的Swin Transformer网络

Int J Comput Assist Radiol Surg. 2023 Jan;18(1):139-147. doi: 10.1007/s11548-022-02785-y. Epub 2022 Nov 4.

SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network.SV-RCNet：基于递归卷积网络的手术视频工作流程识别

IEEE Trans Med Imaging. 2018 May;37(5):1114-1126. doi: 10.1109/TMI.2017.2787657.

Transformer-Based Model with Dynamic Attention Pyramid Head for Semantic Segmentation of VHR Remote Sensing Imagery.基于Transformer且带有动态注意力金字塔头的甚高分辨率遥感影像语义分割模型

Entropy (Basel). 2022 Nov 6;24(11):1619. doi: 10.3390/e24111619.

LRTD: long-range temporal dependency based active learning for surgical workflow recognition.基于长程时间依赖的主动学习在手术流程识别中的应用

Int J Comput Assist Radiol Surg. 2020 Sep;15(9):1573-1584. doi: 10.1007/s11548-020-02198-9. Epub 2020 Jun 25.

Surgical workflow recognition with temporal convolution and transformer for action segmentation.基于时间卷积和Transformer的手术流程识别用于动作分割

Int J Comput Assist Radiol Surg. 2023 Apr;18(4):785-794. doi: 10.1007/s11548-022-02811-z. Epub 2022 Dec 21.

Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis.跨模态 SVNet：用于手术流程分析的混合嵌入聚合 Transformer。

Int J Comput Assist Radiol Surg. 2022 Dec;17(12):2193-2202. doi: 10.1007/s11548-022-02743-8. Epub 2022 Sep 21.

Swin-HSTPS: Research on Target Detection Algorithms for Multi-Source High-Resolution Remote Sensing Images.Swin-HSTPS：多源高分遥感图像目标检测算法研究。

Sensors (Basel). 2021 Dec 4;21(23):8113. doi: 10.3390/s21238113.

Against spatial-temporal discrepancy: contrastive learning-based network for surgical workflow recognition.对抗时空差异：基于对比学习的手术流程识别网络。

Int J Comput Assist Radiol Surg. 2021 May;16(5):839-848. doi: 10.1007/s11548-021-02382-5. Epub 2021 May 5.

Spatio-Temporal Causal Transformer for Multi-Grained Surgical Phase Recognition.时空因果Transformer 用于多粒度手术阶段识别。

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:1663-1666. doi: 10.1109/EMBC48229.2022.9871004.

Swin-UNet++: A Nested Swin Transformer Architecture for Location Identification and Morphology Segmentation of Dimples on 2.25Cr1Mo0.25V Fractured Surface.Swin-UNet++：一种用于2.25Cr1Mo0.25V断口表面凹坑位置识别和形态分割的嵌套式Swin Transformer架构

Materials (Basel). 2021 Dec 7;14(24):7504. doi: 10.3390/ma14247504.

引用本文的文献

Use of artificial intelligence in the analysis of digital videos of invasive surgical procedures: scoping review.人工智能在侵入性外科手术数字视频分析中的应用：范围综述。

BJS Open. 2025 Jul 1;9(4). doi: 10.1093/bjsopen/zraf073.

Artificial intelligence-assisted phase recognition and skill assessment in laparoscopic surgery: a systematic review.腹腔镜手术中人工智能辅助的阶段识别与技能评估：一项系统综述

Front Surg. 2025 Apr 11;12:1551838. doi: 10.3389/fsurg.2025.1551838. eCollection 2025.

Surgical phase and instrument recognition: how to identify appropriate dataset splits.手术阶段和器械识别：如何识别合适的数据集划分。

Int J Comput Assist Radiol Surg. 2024 Apr;19(4):699-711. doi: 10.1007/s11548-024-03063-9. Epub 2024 Jan 29.

Multimodal-based machine learning strategy for accurate and non-invasive prediction of intramedullary glioma grade and mutation status of molecular markers: a retrospective study.基于多模态的机器学习策略，用于准确、无创预测髓内胶质瘤分级和分子标志物突变状态：一项回顾性研究。

BMC Med. 2023 May 29;21(1):198. doi: 10.1186/s12916-023-02898-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于手术视频工作流识别的基于时间的Swin Transformer网络

Temporal-based Swin Transformer network for workflow recognition of surgical video.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献