Lou Ange, Noble Jack
Vanderbilt University, Department of Electrical and Computer Engineering, Nashville, Tennessee, United States.
J Med Imaging (Bellingham). 2025 Mar;12(2):025003. doi: 10.1117/1.JMI.12.2.025003. Epub 2025 Apr 30.
Accurate depth estimation in surgical videos is a pivotal component of numerous image-guided surgery procedures. However, creating ground truth depth maps for surgical videos is often infeasible due to challenges such as inconsistent illumination and sensor noise. As a result, self-supervised depth and ego-motion estimation frameworks are gaining traction, eliminating the need for manually annotated depth maps. Despite the progress, current self-supervised methods still rely on known camera intrinsic parameters, which are frequently unavailable or unrecorded in surgical environments. We address this gap by introducing a self-supervised system capable of jointly predicting depth maps, camera poses, and intrinsic parameters, providing a comprehensive solution for depth estimation under such constraints.
We developed a self-supervised depth and ego-motion estimation framework, incorporating a cost volume-based auxiliary supervision module. This module provides additional supervision for predicting camera intrinsic parameters, allowing for robust estimation even without predefined intrinsics. The system was rigorously evaluated on a public dataset to assess its effectiveness in simultaneously predicting depth, camera pose, and intrinsic parameters.
The experimental results demonstrated that the proposed method significantly improved the accuracy of ego-motion and depth prediction, even when compared with methods incorporating known camera intrinsics. In addition, by integrating our cost volume-based supervision, the accuracy of camera parameter estimation, including intrinsic parameters, was further enhanced.
We present a self-supervised system for depth, ego-motion, and intrinsic parameter estimation, effectively overcoming the limitations imposed by unknown or missing camera intrinsics. The experimental results confirm that the proposed method outperforms the baseline techniques, offering a robust solution for depth estimation in complex surgical video scenarios, with broader implications for improving image-guided surgery systems.
手术视频中的准确深度估计是众多图像引导手术程序的关键组成部分。然而,由于光照不一致和传感器噪声等挑战,为手术视频创建真实深度图往往不可行。因此,自监督深度和自我运动估计框架越来越受到关注,无需手动标注深度图。尽管取得了进展,但当前的自监督方法仍然依赖已知的相机内参,而在手术环境中这些参数常常不可用或未记录。我们通过引入一个能够联合预测深度图、相机姿态和内参的自监督系统来解决这一差距,为在这种约束下的深度估计提供了一个全面的解决方案。
我们开发了一个自监督深度和自我运动估计框架,纳入了基于代价体的辅助监督模块。该模块为预测相机内参提供额外监督,即使没有预定义的内参也能进行稳健估计。该系统在一个公共数据集上进行了严格评估,以评估其在同时预测深度、相机姿态和内参方面的有效性。
实验结果表明,即使与包含已知相机内参的方法相比,所提出的方法也显著提高了自我运动和深度预测的准确性。此外,通过整合我们基于代价体的监督,包括内参在内的相机参数估计的准确性进一步提高。
我们提出了一个用于深度、自我运动和内参估计的自监督系统,有效克服了未知或缺失相机内参带来的限制。实验结果证实,所提出的方法优于基线技术,为复杂手术视频场景中的深度估计提供了一个稳健的解决方案,对改进图像引导手术系统具有更广泛的意义。