Suppr超能文献

利用点跟踪和“分割一切”模型增强视频中高效的实时手术器械分割

Augmenting efficient real-time surgical instrument segmentation in video with point tracking and Segment Anything.

作者信息

Wu Zijian, Schmidt Adam, Kazanzides Peter, Salcudean Septimiu E

机构信息

Robotics and Control Laboratory, Department of Electrical and Computer Engineering The University of British Columbia Vancouver Canada.

Department of Computer Science Johns Hopkins University Baltimore Maryland USA.

出版信息

Healthc Technol Lett. 2024 Dec 30;12(1):e12111. doi: 10.1049/htl2.12111. eCollection 2025 Jan-Dec.

Abstract

The Segment Anything model (SAM) is a powerful vision foundation model that is revolutionizing the traditional paradigm of segmentation. Despite this, a reliance on prompting each frame and large computational cost limit its usage in robotically assisted surgery. Applications, such as augmented reality guidance, require little user intervention along with efficient inference to be usable clinically. This study addresses these limitations by adopting lightweight SAM variants to meet the efficiency requirement and employing fine-tuning techniques to enhance their generalization in surgical scenes. Recent advancements in tracking any point have shown promising results in both accuracy and efficiency, particularly when points are occluded or leave the field of view. Inspired by this progress, a novel framework is presented that combines an online point tracker with a lightweight SAM model that is fine-tuned for surgical instrument segmentation. Sparse points within the region of interest are tracked and used to prompt SAM throughout the video sequence, providing temporal consistency. The quantitative results surpass the state-of-the-art semi-supervised video object segmentation method XMem on the EndoVis 2015 dataset with 84.8 IoU and 91.0 Dice. The method achieves promising performance that is comparable to XMem and transformer-based fully supervised segmentation methods on ex vivo UCL dVRK and in vivo CholecSeg8k datasets. In addition, the proposed method shows promising zero-shot generalization ability on the label-free STIR dataset. In terms of efficiency, the method was tested on a single GeForce RTX 4060/4090 GPU respectively, achieving an over 25/90 FPS inference speed. Code is available at: https://github.com/zijianwu1231/SIS-PT-SAM.

摘要

段分割一切模型(SAM)是一个强大的视觉基础模型,正在彻底改变传统的分割范式。尽管如此,依赖于对每一帧进行提示以及巨大的计算成本限制了它在机器人辅助手术中的应用。诸如增强现实引导等应用,在临床上要可用,就需要很少的用户干预以及高效的推理。本研究通过采用轻量级的SAM变体来满足效率要求,并采用微调技术来增强它们在手术场景中的通用性,从而解决了这些限制。最近在跟踪任意点方面的进展在准确性和效率方面都显示出了有希望的结果,特别是当点被遮挡或离开视野时。受这一进展的启发,提出了一种新颖的框架,该框架将在线点跟踪器与为手术器械分割进行了微调的轻量级SAM模型相结合。在整个视频序列中跟踪感兴趣区域内的稀疏点,并用于提示SAM,从而提供时间一致性。在EndoVis 2015数据集上,定量结果以84.8的交并比(IoU)和91.0的Dice系数超过了当前最先进的半监督视频对象分割方法XMem。该方法在离体的UCL dVRK和体内的CholecSeg8k数据集上实现了与XMem以及基于Transformer的全监督分割方法相当的有前景的性能。此外,所提出的方法在无标签的STIR数据集上显示出了有前景的零样本泛化能力。在效率方面,该方法分别在单个GeForce RTX 4060/4090 GPU上进行了测试,实现了超过25/90帧每秒的推理速度。代码可在以下网址获取:https://github.com/zijianwu1231/SIS-PT-SAM

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5891/11730982/185809822cad/HTL2-12-e12111-g002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验