Khajarian Serouj, Schwimmbeck Michael, Holzapfel Konstantin, Schmidt Johannes, Auer Christopher, Remmele Stefanie, Amft Oliver
Research Group Medical Technologies, University of Applied Sciences Landshut, 84036, Landshut, Germany.
Intelligent Embedded Systems Lab., University of Freiburg, 79085, Freiburg, Germany.
Int J Comput Assist Radiol Surg. 2025 May 14. doi: 10.1007/s11548-025-03381-6.
We introduce a multimodel, real-time semantic segmentation and tracking approach for Augmented Reality (AR)-guided open liver surgery. Our approach leverages foundation models and scene-aware re-prompting strategies to balance segmentation accuracy and inference time as required for real-time AR-assisted surgery applications.
Our approach integrates a domain-specific RGBD model (ESANet), a foundation model for semantic segmentation (SAM), and a semi-supervised video object segmentation model (DeAOT). Models were combined in an auto-promptable pipeline with a scene-aware re-prompting algorithm that adapts to surgical scene changes. We evaluated our approach on intraoperative RGBD videos from 10 open liver surgeries using a head-mounted AR device. Segmentation accuracy (IoU), temporal resolution (FPS), and the impact of re-prompting strategies were analyzed. Comparisons to individual models were performed.
Our multimodel approach achieved a median IoU of 71% at 13.2 FPS without re-prompting. Performance of our multimodel approach surpasses that of individual models, yielding better segmentation accuracy than ESANet and better temporal resolution compared to SAM. Our scene-aware re-prompting method reaches the DeAOT performance, with an IoU of 74.7% at 11.5 FPS, even when the DeAOT model uses an ideal reference frame.
Our scene-aware re-prompting strategy provides a trade-off between segmentation accuracy and temporal resolution, thus addressing the requirements of real-time AR-guided open liver surgery. The integration of complementary models resulted in robust and accurate segmentation in a complex, real-world surgical settings.
我们介绍一种用于增强现实(AR)引导下开放性肝脏手术的多模型实时语义分割与跟踪方法。我们的方法利用基础模型和场景感知重新提示策略,以根据实时AR辅助手术应用的要求平衡分割精度和推理时间。
我们的方法集成了一个特定领域的RGBD模型(ESANet)、一个语义分割基础模型(SAM)和一个半监督视频对象分割模型(DeAOT)。这些模型在一个可自动提示的管道中与一种适应手术场景变化的场景感知重新提示算法相结合。我们使用头戴式AR设备,在10例开放性肝脏手术的术中RGBD视频上评估了我们的方法。分析了分割精度(交并比)、时间分辨率(每秒帧数)以及重新提示策略的影响。并与单个模型进行了比较。
我们的多模型方法在不进行重新提示的情况下,以13.2帧/秒的速度实现了71%的中位数交并比。我们的多模型方法的性能超过了单个模型,产生了比ESANet更好的分割精度,与SAM相比具有更好的时间分辨率。即使DeAOT模型使用理想参考帧,我们的场景感知重新提示方法也能达到DeAOT的性能,在11.5帧/秒时交并比为74.7%。
我们的场景感知重新提示策略在分割精度和时间分辨率之间提供了一种权衡,从而满足了实时AR引导下开放性肝脏手术的要求。互补模型的集成在复杂的现实世界手术环境中实现了强大而准确的分割。