Suppr超能文献

级联多层变换网络用于手术流程分析。

Cascade Multi-Level Transformer Network for Surgical Workflow Analysis.

出版信息

IEEE Trans Med Imaging. 2023 Oct;42(10):2817-2831. doi: 10.1109/TMI.2023.3265354. Epub 2023 Oct 2.

Abstract

Surgical workflow analysis aims to recognise surgical phases from untrimmed surgical videos. It is an integral component for enabling context-aware computer-aided surgical operating systems. Many deep learning-based methods have been developed for this task. However, most existing works aggregate homogeneous temporal context for all frames at a single level and neglect the fact that each frame has its specific need for information at multiple levels for accurate phase prediction. To fill this gap, in this paper we propose Cascade Multi-Level Transformer Network (CMTNet) composed of cascaded Adaptive Multi-Level Context Aggregation (AMCA) modules. Each AMCA module first extracts temporal context at the frame level and the phase level and then fuses frame-specific spatial feature, frame-level temporal context, and phase-level temporal context for each frame adaptively. By cascading multiple AMCA modules, CMTNet is able to gradually enrich the representation of each frame with the multi-level semantics that it specifically requires, achieving better phase prediction in a frame-adaptive manner. In addition, we propose a novel refinement loss for CMTNet, which explicitly guides each AMCA module to focus on extracting the key context for refining the prediction of the previous stage in terms of both prediction confidence and smoothness. This further enhances the quality of the extracted context effectively. Extensive experiments on the Cholec80 and the M2CAI datasets demonstrate that CMTNet achieves state-of-the-art performance.

摘要

手术流程分析旨在从未经修剪的手术视频中识别手术阶段。它是实现上下文感知计算机辅助手术操作系统的一个组成部分。为此任务已经开发了许多基于深度学习的方法。然而,大多数现有工作在单个级别上聚合同质的时间上下文,而忽略了每个帧都有其在多个级别上对信息的特定需求这一事实,以便进行准确的阶段预测。为了弥补这一差距,本文提出了由级联自适应多水平上下文聚合(AMCA)模块组成的级联多水平 Transformer 网络(CMTNet)。每个 AMCA 模块首先在帧级和阶段级提取时间上下文,然后自适应地融合特定于帧的空间特征、帧级时间上下文和阶段级时间上下文。通过级联多个 AMCA 模块,CMTNet 能够逐步用其特定需要的多层次语义丰富每个帧的表示,以帧自适应的方式实现更好的阶段预测。此外,我们为 CMTNet 提出了一种新颖的细化损失,该损失明确引导每个 AMCA 模块专注于提取关键上下文,以便根据预测置信度和平滑度来细化前一阶段的预测。这进一步有效地提高了提取上下文的质量。在 Cholec80 和 M2CAI 数据集上的广泛实验表明,CMTNet 达到了最先进的性能。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验