级联多层变换网络用于手术流程分析。

Cascade Multi-Level Transformer Network for Surgical Workflow Analysis.

出版信息

IEEE Trans Med Imaging. 2023 Oct;42(10):2817-2831. doi: 10.1109/TMI.2023.3265354. Epub 2023 Oct 2.

DOI:10.1109/TMI.2023.3265354

Abstract

Surgical workflow analysis aims to recognise surgical phases from untrimmed surgical videos. It is an integral component for enabling context-aware computer-aided surgical operating systems. Many deep learning-based methods have been developed for this task. However, most existing works aggregate homogeneous temporal context for all frames at a single level and neglect the fact that each frame has its specific need for information at multiple levels for accurate phase prediction. To fill this gap, in this paper we propose Cascade Multi-Level Transformer Network (CMTNet) composed of cascaded Adaptive Multi-Level Context Aggregation (AMCA) modules. Each AMCA module first extracts temporal context at the frame level and the phase level and then fuses frame-specific spatial feature, frame-level temporal context, and phase-level temporal context for each frame adaptively. By cascading multiple AMCA modules, CMTNet is able to gradually enrich the representation of each frame with the multi-level semantics that it specifically requires, achieving better phase prediction in a frame-adaptive manner. In addition, we propose a novel refinement loss for CMTNet, which explicitly guides each AMCA module to focus on extracting the key context for refining the prediction of the previous stage in terms of both prediction confidence and smoothness. This further enhances the quality of the extracted context effectively. Extensive experiments on the Cholec80 and the M2CAI datasets demonstrate that CMTNet achieves state-of-the-art performance.

摘要

手术流程分析旨在从未经修剪的手术视频中识别手术阶段。它是实现上下文感知计算机辅助手术操作系统的一个组成部分。为此任务已经开发了许多基于深度学习的方法。然而，大多数现有工作在单个级别上聚合同质的时间上下文，而忽略了每个帧都有其在多个级别上对信息的特定需求这一事实，以便进行准确的阶段预测。为了弥补这一差距，本文提出了由级联自适应多水平上下文聚合（AMCA）模块组成的级联多水平 Transformer 网络（CMTNet）。每个 AMCA 模块首先在帧级和阶段级提取时间上下文，然后自适应地融合特定于帧的空间特征、帧级时间上下文和阶段级时间上下文。通过级联多个 AMCA 模块，CMTNet 能够逐步用其特定需要的多层次语义丰富每个帧的表示，以帧自适应的方式实现更好的阶段预测。此外，我们为 CMTNet 提出了一种新颖的细化损失，该损失明确引导每个 AMCA 模块专注于提取关键上下文，以便根据预测置信度和平滑度来细化前一阶段的预测。这进一步有效地提高了提取上下文的质量。在 Cholec80 和 M2CAI 数据集上的广泛实验表明，CMTNet 达到了最先进的性能。

相似文献

Cascade Multi-Level Transformer Network for Surgical Workflow Analysis.级联多层变换网络用于手术流程分析。

IEEE Trans Med Imaging. 2023 Oct;42(10):2817-2831. doi: 10.1109/TMI.2023.3265354. Epub 2023 Oct 2.

Temporal-based Swin Transformer network for workflow recognition of surgical video.用于手术视频工作流识别的基于时间的Swin Transformer网络

Int J Comput Assist Radiol Surg. 2023 Jan;18(1):139-147. doi: 10.1007/s11548-022-02785-y. Epub 2022 Nov 4.

LoViT: Long Video Transformer for surgical phase recognition.LoViT：用于手术阶段识别的长视频 Transformer。

Med Image Anal. 2025 Jan;99:103366. doi: 10.1016/j.media.2024.103366. Epub 2024 Oct 5.

Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis.跨模态 SVNet：用于手术流程分析的混合嵌入聚合 Transformer。

Int J Comput Assist Radiol Surg. 2022 Dec;17(12):2193-2202. doi: 10.1007/s11548-022-02743-8. Epub 2022 Sep 21.

Temporal Memory Relation Network for Workflow Recognition From Surgical Video.基于时间记忆关系网络的手术视频流程识别

IEEE Trans Med Imaging. 2021 Jul;40(7):1911-1923. doi: 10.1109/TMI.2021.3069471. Epub 2021 Jun 30.

Surgical workflow recognition with temporal convolution and transformer for action segmentation.基于时间卷积和Transformer的手术流程识别用于动作分割

Int J Comput Assist Radiol Surg. 2023 Apr;18(4):785-794. doi: 10.1007/s11548-022-02811-z. Epub 2022 Dec 21.

Semi-supervised learning with progressive unlabeled data excavation for label-efficient surgical workflow recognition.基于渐进式未标记数据挖掘的半监督学习在标签高效手术流程识别中的应用。

Med Image Anal. 2021 Oct;73:102158. doi: 10.1016/j.media.2021.102158. Epub 2021 Jul 8.

Anticipation for surgical workflow through instrument interaction and recognized Signals.通过器械交互和识别信号预测手术流程。

Med Image Anal. 2022 Nov;82:102611. doi: 10.1016/j.media.2022.102611. Epub 2022 Sep 6.

Confidence-Guided Self Refinement for Action Prediction in Untrimmed Videos.用于未修剪视频动作预测的置信度引导自精炼

IEEE Trans Image Process. 2020 Apr 17. doi: 10.1109/TIP.2020.2987425.

Against spatial-temporal discrepancy: contrastive learning-based network for surgical workflow recognition.对抗时空差异：基于对比学习的手术流程识别网络。

Int J Comput Assist Radiol Surg. 2021 May;16(5):839-848. doi: 10.1007/s11548-021-02382-5. Epub 2021 May 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

级联多层变换网络用于手术流程分析。

Cascade Multi-Level Transformer Network for Surgical Workflow Analysis.

出版信息

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献