School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei Anhui, 230026, P.R. China; Center for Medical Imaging, Robotics, Analytic Computing & Learning(MIRACLE), Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, 215123, P.R. China.
School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei Anhui, 230026, P.R. China; Center for Medical Imaging, Robotics, Analytic Computing & Learning(MIRACLE), Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu, 215123, P.R. China; Key Laboratory of Precision and Intelligent Chemistry, University of Science and Technology of China, Hefei Anhui, 230026, P.R. China; Key Lab of Intelligent Information Processing of Chinese Academy of Sciences(CAS), Institute of Computing Technology, CAS, Beijing, 100190, P.R. China.
Med Image Anal. 2025 Jan;99:103387. doi: 10.1016/j.media.2024.103387. Epub 2024 Nov 12.
Surgical instrument segmentation is instrumental to minimally invasive surgeries and related applications. Most previous methods formulate this task as single-frame-based instance segmentation while ignoring the natural temporal and stereo attributes of a surgical video. As a result, these methods are less robust against the appearance variation through temporal motion and view change. In this work, we propose a novel LACOSTE model that exploits Location-Agnostic COntexts in Stereo and TEmporal images for improved surgical instrument segmentation. Leveraging a query-based segmentation model as core, we design three performance-enhancing modules. Firstly, we design a disparity-guided feature propagation module to enhance depth-aware features explicitly. To generalize well for even only a monocular video, we apply a pseudo stereo scheme to generate complementary right images. Secondly, we propose a stereo-temporal set classifier, which aggregates stereo-temporal contexts in a universal way for making a consolidated prediction and mitigates transient failures. Finally, we propose a location-agnostic classifier to decouple the location bias from mask prediction and enhance the feature semantics. We extensively validate our approach on three public surgical video datasets, including two benchmarks from EndoVis Challenges and one real radical prostatectomy surgery dataset GraSP. Experimental results demonstrate the promising performances of our method, which consistently achieves comparable or favorable results with previous state-of-the-art approaches.
手术器械分割对于微创手术和相关应用至关重要。大多数先前的方法将此任务表述为基于单帧的实例分割,而忽略了手术视频的自然时间和立体属性。因此,这些方法在面对时间运动和视角变化引起的外观变化时,稳健性较差。在这项工作中,我们提出了一种新颖的 LACOSTE 模型,该模型利用立体和时间图像中的位置无关上下文来改进手术器械分割。我们以基于查询的分割模型为核心,设计了三个性能增强模块。首先,我们设计了一个视差引导的特征传播模块,以显式增强深度感知特征。为了即使只有单目视频也能很好地泛化,我们应用了一种伪立体方案来生成互补的右图像。其次,我们提出了一种立体时间集分类器,它以通用的方式聚合立体时间上下文,以做出一致的预测,并减轻瞬态故障。最后,我们提出了一种位置无关的分类器,将位置偏差与掩模预测分离,并增强特征语义。我们在三个公共手术视频数据集上进行了广泛的验证,包括来自 EndoVis 挑战赛的两个基准数据集和一个真实的根治性前列腺切除术数据集 GraSP。实验结果表明,我们的方法具有很有前景的性能,与以前的最先进方法相比,我们的方法始终能够取得可比或更好的结果。