Xu Chi, Huang Baoru, Elson Daniel S
The Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, London SW7 2AZ, UK.
IEEE Trans Med Robot Bionics. 2022 May;4(2):331-334. doi: 10.1109/TMRB.2022.3170206.
We present a novel self-supervised training framework with 3D displacement (3DD) module for accurately estimating per-pixel depth maps from single laparoscopic images. Recently, several self-supervised learning based monocular depth estimation models have achieved good results on the KITTI dataset, under the hypothesis that the camera is dynamic and the objects are stationary, however this hypothesis is often reversed in the surgical setting (laparoscope is stationary, the surgical instruments and tissues are dynamic). Therefore, a 3DD module is proposed to establish the relation between frames instead of ego-motion estimation. In the 3DD module, a convolutional neural network (CNN) analyses source and target frames to predict the 3D displacement of a 3D point cloud from a target frame to a source frame in the coordinates of the camera. Since it is difficult to constrain the depth displacement from two 2D images, a novel depth consistency module is proposed to maintain depth consistency between displacement-updated depth and model-estimated depth to constrain 3D displacement effectively. Our proposed method achieves remarkable performance for monocular depth estimation on the Hamlyn surgical dataset and acquired ground truth depth maps, outperforming monodepth, monodepth2 and packnet models.
我们提出了一种带有3D位移(3DD)模块的新型自监督训练框架,用于从单张腹腔镜图像中准确估计逐像素深度图。最近,一些基于自监督学习的单目深度估计模型在KITTI数据集上取得了良好的效果,其假设是相机是动态的而物体是静止的,然而在手术场景中(腹腔镜是静止的,手术器械和组织是动态的),这个假设往往是相反的。因此,提出了一个3DD模块来建立帧之间的关系,而不是进行自我运动估计。在3DD模块中,一个卷积神经网络(CNN)分析源帧和目标帧,以预测相机坐标中从目标帧到源帧的3D点云的3D位移。由于很难从两个2D图像中约束深度位移,因此提出了一个新颖的深度一致性模块,以保持位移更新深度和模型估计深度之间的深度一致性,从而有效地约束3D位移。我们提出的方法在哈姆林手术数据集和获取的地面真值深度图上进行单目深度估计时取得了显著的性能,优于单目深度(Monodepth)、单目深度2(Monodepth2)和PackNet模型。