Suppr超能文献

用于手术视频工作流识别的基于时间的Swin Transformer网络

Temporal-based Swin Transformer network for workflow recognition of surgical video.

作者信息

Pan Xiaoying, Gao Xuanrong, Wang Hongyu, Zhang Wuxia, Mu Yuanzhen, He Xianli

机构信息

School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, GuoDu, Xi'an, 710121, Shaanxi, China.

Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an, 710121, Shaanxi, China.

出版信息

Int J Comput Assist Radiol Surg. 2023 Jan;18(1):139-147. doi: 10.1007/s11548-022-02785-y. Epub 2022 Nov 4.

Abstract

PURPOSE

Surgical workflow recognition has emerged as an important part of computer-assisted intervention systems for the modern operating room, which also is a very challenging problem. Although the CNN-based approach achieves excellent performance, it does not learn global and long-range semantic information interactions well due to the inductive bias inherent in convolution.

METHODS

In this paper, we propose a temporal-based Swin Transformer network (TSTNet) for the surgical video workflow recognition task. TSTNet contains two main parts: the Swin Transformer and the LSTM. The Swin Transformer incorporates the attention mechanism to encode remote dependencies and learn highly expressive representations. The LSTM is capable of learning long-range dependencies and is used to extract temporal information. The TSTNet organically combines the two components to extract spatiotemporal features that contain more contextual information. In particular, based on a full understanding of the natural features of the surgical video, we propose a priori revision algorithm (PRA) using a priori information about the sequence of the surgical phase. This strategy optimizes the output of TSTNet and further improves the recognition performance.

RESULTS

We conduct extensive experiments using the Cholec80 dataset to validate the effectiveness of the TSTNet-PRA method. Our method achieves excellent performance on the Cholec80 dataset, which accuracy is up to 92.8% and greatly exceeds the state-of-the-art methods.

CONCLUSION

By modelling remote temporal information and multi-scale visual information, we propose the TSTNet-PRA method. It was evaluated on a large public dataset, showing a high recognition capability superior to other spatiotemporal networks.

摘要

目的

手术工作流程识别已成为现代手术室计算机辅助干预系统的重要组成部分,这也是一个极具挑战性的问题。尽管基于卷积神经网络(CNN)的方法取得了优异的性能,但由于卷积固有的归纳偏差,它不能很好地学习全局和远程语义信息交互。

方法

在本文中,我们提出了一种基于时间的Swin Transformer网络(TSTNet)用于手术视频工作流程识别任务。TSTNet包含两个主要部分:Swin Transformer和长短期记忆网络(LSTM)。Swin Transformer结合注意力机制来编码远程依赖并学习高表达性的表示。LSTM能够学习长程依赖并用于提取时间信息。TSTNet将这两个组件有机结合以提取包含更多上下文信息的时空特征。特别是,在充分理解手术视频自然特征的基础上,我们提出了一种使用手术阶段序列先验信息的先验修正算法(PRA)。该策略优化了TSTNet的输出并进一步提高了识别性能。

结果

我们使用Cholec80数据集进行了广泛的实验,以验证TSTNet - PRA方法的有效性。我们的方法在Cholec80数据集上取得了优异的性能,准确率高达92.8%,大大超过了现有最先进的方法。

结论

通过对远程时间信息和多尺度视觉信息进行建模,我们提出了TSTNet - PRA方法。它在一个大型公共数据集上进行了评估,显示出优于其他时空网络的高识别能力。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验