Politecnico di Milano, Department of Electronics, Information and Bioengineering, Milano, 20133, Italy.
Politecnico di Milano, Department of Electronics, Information and Bioengineering, Milano, 20133, Italy.
Comput Methods Programs Biomed. 2024 Feb;244:107937. doi: 10.1016/j.cmpb.2023.107937. Epub 2023 Nov 22.
Safety of robotic surgery can be enhanced through augmented vision or artificial constraints to the robotl motion, and intra-operative depth estimation is the cornerstone of these applications because it provides precise position information of surgical scenes in 3D space. High-quality depth estimation of endoscopic scenes has been a valuable issue, and the development of deep learning provides more possibility and potential to address this issue.
In this paper, a deep learning-based approach is proposed to recover 3D information of intra-operative scenes. To this aim, a fully 3D encoder-decoder network integrating spatio-temporal layers is designed, and it adopts hierarchical prediction and progressive learning to enhance prediction accuracy and shorten training time.
Our network gets the depth estimation accuracy of MAE 2.55±1.51 (mm) and RMSE 5.23±1.40 (mm) using 8 surgical videos with a resolution of 1280×1024, which performs better compared with six other state-of-the-art methods that were trained on the same data.
Our network can implement a promising depth estimation performance in intra-operative scenes using stereo images, allowing the integration in robot-assisted surgery to enhance safety.
通过增强视觉或对机器人运动施加人为限制,可以提高机器人手术的安全性,术中深度估计是这些应用的基础,因为它提供了三维空间中手术场景的精确位置信息。高质量的内窥镜场景深度估计一直是一个有价值的问题,深度学习的发展为解决这个问题提供了更多的可能性和潜力。
本文提出了一种基于深度学习的方法来恢复手术场景中的三维信息。为此,设计了一个完全的 3D 编解码器网络,集成了时空层,它采用分层预测和渐进式学习来提高预测精度和缩短训练时间。
我们的网络使用分辨率为 1280×1024 的 8 个手术视频,得到 MAE 为 2.55±1.51(mm)和 RMSE 为 5.23±1.40(mm)的深度估计精度,与在相同数据上训练的其他 6 种最先进的方法相比,表现更好。
我们的网络可以使用立体图像实现术中场景有前景的深度估计性能,允许集成到机器人辅助手术中以提高安全性。