• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

跨模态 SVNet:用于手术流程分析的混合嵌入聚合 Transformer。

Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis.

机构信息

Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), Department of Computer Science, University College, London, UK.

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, HK, China.

出版信息

Int J Comput Assist Radiol Surg. 2022 Dec;17(12):2193-2202. doi: 10.1007/s11548-022-02743-8. Epub 2022 Sep 21.

DOI:10.1007/s11548-022-02743-8
PMID:36129573
Abstract

PURPOSE

Real-time surgical workflow analysis has been a key component for computer-assisted intervention system to improve cognitive assistance. Most existing methods solely rely on conventional temporal models and encode features with a successive spatial-temporal arrangement. Supportive benefits of intermediate features are partially lost from both visual and temporal aspects. In this paper, we rethink feature encoding to attend and preserve the critical information for accurate workflow recognition and anticipation.

METHODS

We introduce Transformer in surgical workflow analysis, to reconsider complementary effects of spatial and temporal representations. We propose a hybrid embedding aggregation Transformer, named Trans-SVNet, to effectively interact with the designed spatial and temporal embeddings, by employing spatial embedding to query temporal embedding sequence. We jointly optimized by loss objectives from both analysis tasks to leverage their high correlation.

RESULTS

We extensively evaluate our method on three large surgical video datasets. Our method consistently outperforms the state-of-the-arts across three datasets on workflow recognition task. Jointly learning with anticipation, recognition results can gain a large improvement. Our approach also shows its effectiveness on anticipation with promising performance achieved. Our model achieves a real-time inference speed of 0.0134 second per frame.

CONCLUSION

Experimental results demonstrate the efficacy of our hybrid embeddings integration by rediscovering the crucial cues from complementary spatial-temporal embeddings. The better performance by multi-task learning indicates that anticipation task brings the additional knowledge to recognition task. Promising effectiveness and efficiency of our method also show its promising potential to be used in operating room.

摘要

目的

实时手术工作流程分析是计算机辅助干预系统提高认知辅助的关键组成部分。大多数现有方法仅依赖于传统的时间模型,并通过连续的时空排列来对特征进行编码。从视觉和时间方面来看,中间特征的辅助作用都部分丢失了。在本文中,我们重新考虑特征编码,以关注和保留对准确工作流程识别和预测至关重要的信息。

方法

我们在手术工作流程分析中引入了 Transformer,以重新考虑空间和时间表示的互补效应。我们提出了一种名为 Trans-SVNet 的混合嵌入聚合 Transformer,通过使用空间嵌入来查询时间嵌入序列,有效地与设计的空间和时间嵌入进行交互。我们通过来自两个分析任务的损失目标进行联合优化,以利用它们之间的高度相关性。

结果

我们在三个大型手术视频数据集上对我们的方法进行了广泛评估。我们的方法在三个数据集上的工作流程识别任务中均优于最先进的方法。与预测一起进行联合学习,识别结果可以得到很大的提高。我们的方法在预测方面也表现出了有效性,取得了有前景的性能。我们的模型实现了 0.0134 秒/帧的实时推理速度。

结论

实验结果表明,通过重新发现互补时空嵌入中的关键线索,我们的混合嵌入集成方法是有效的。通过多任务学习获得的更好性能表明,预测任务为识别任务带来了额外的知识。我们的方法具有有前景的有效性和效率,也表明了其在手术室中应用的潜力。

相似文献

1
Trans-SVNet: hybrid embedding aggregation Transformer for surgical workflow analysis.跨模态 SVNet:用于手术流程分析的混合嵌入聚合 Transformer。
Int J Comput Assist Radiol Surg. 2022 Dec;17(12):2193-2202. doi: 10.1007/s11548-022-02743-8. Epub 2022 Sep 21.
2
Temporal-based Swin Transformer network for workflow recognition of surgical video.用于手术视频工作流识别的基于时间的Swin Transformer网络
Int J Comput Assist Radiol Surg. 2023 Jan;18(1):139-147. doi: 10.1007/s11548-022-02785-y. Epub 2022 Nov 4.
3
Against spatial-temporal discrepancy: contrastive learning-based network for surgical workflow recognition.对抗时空差异:基于对比学习的手术流程识别网络。
Int J Comput Assist Radiol Surg. 2021 May;16(5):839-848. doi: 10.1007/s11548-021-02382-5. Epub 2021 May 5.
4
Temporal Memory Relation Network for Workflow Recognition From Surgical Video.基于时间记忆关系网络的手术视频流程识别
IEEE Trans Med Imaging. 2021 Jul;40(7):1911-1923. doi: 10.1109/TMI.2021.3069471. Epub 2021 Jun 30.
5
Surgical workflow recognition with temporal convolution and transformer for action segmentation.基于时间卷积和Transformer的手术流程识别用于动作分割
Int J Comput Assist Radiol Surg. 2023 Apr;18(4):785-794. doi: 10.1007/s11548-022-02811-z. Epub 2022 Dec 21.
6
Semi-supervised learning with progressive unlabeled data excavation for label-efficient surgical workflow recognition.基于渐进式未标记数据挖掘的半监督学习在标签高效手术流程识别中的应用。
Med Image Anal. 2021 Oct;73:102158. doi: 10.1016/j.media.2021.102158. Epub 2021 Jul 8.
7
Cascade Multi-Level Transformer Network for Surgical Workflow Analysis.级联多层变换网络用于手术流程分析。
IEEE Trans Med Imaging. 2023 Oct;42(10):2817-2831. doi: 10.1109/TMI.2023.3265354. Epub 2023 Oct 2.
8
LoViT: Long Video Transformer for surgical phase recognition.LoViT:用于手术阶段识别的长视频 Transformer。
Med Image Anal. 2025 Jan;99:103366. doi: 10.1016/j.media.2024.103366. Epub 2024 Oct 5.
9
ISTR: Mask-Embedding-Based Instance Segmentation Transformer.ISTR:基于掩码嵌入的实例分割变换器
IEEE Trans Image Process. 2024;33:2895-2907. doi: 10.1109/TIP.2024.3385980. Epub 2024 Apr 16.
10
Towards multimodal graph neural networks for surgical instrument anticipation.迈向用于手术器械预测的多模态图神经网络。
Int J Comput Assist Radiol Surg. 2024 Oct;19(10):1929-1937. doi: 10.1007/s11548-024-03226-8. Epub 2024 Jul 10.

引用本文的文献

1
Advances of surgical robotics: image-guided classification and application.手术机器人技术的进展:图像引导分类与应用
Natl Sci Rev. 2024 Jun 6;11(9):nwae186. doi: 10.1093/nsr/nwae186. eCollection 2024 Sep.
2
Surgical phase and instrument recognition: how to identify appropriate dataset splits.手术阶段和器械识别:如何识别合适的数据集划分。
Int J Comput Assist Radiol Surg. 2024 Apr;19(4):699-711. doi: 10.1007/s11548-024-03063-9. Epub 2024 Jan 29.

本文引用的文献

1
Surgical data science - from concepts toward clinical translation.外科数据科学——从概念到临床转化。
Med Image Anal. 2022 Feb;76:102306. doi: 10.1016/j.media.2021.102306. Epub 2021 Nov 18.
2
Multi-task recurrent convolutional network with correlation loss for surgical video analysis.基于相关损失的多任务递归卷积网络在手术视频分析中的应用。
Med Image Anal. 2020 Jan;59:101572. doi: 10.1016/j.media.2019.101572. Epub 2019 Oct 10.
3
CATARACTS: Challenge on automatic tool annotation for cataRACT surgery.白内障:白内障手术自动工具标注挑战。
Med Image Anal. 2019 Feb;52:24-41. doi: 10.1016/j.media.2018.11.008. Epub 2018 Nov 16.