Luo Huoling, Wang Congcong, Duan Xingguang, Liu Hao, Wang Ping, Hu Qingmao, Jia Fucang
Research Lab for Medical Imaging and Digital Surgery, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China; Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, China.
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China; Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway.
Comput Biol Med. 2022 Jan;140:105109. doi: 10.1016/j.compbiomed.2021.105109. Epub 2021 Dec 3.
Learning-based methods have achieved remarkable performances on depth estimation. However, the premise of most self-learning and unsupervised learning methods is built on rigorous, geometrically-aligned stereo rectification. The performances of these methods degrade when the rectification is not accurate. Therefore, we explore an approach for unsupervised depth estimation from stereo images that can handle imperfect camera parameters.
We propose an unsupervised deep convolutional network that takes rectified stereo image pairs as input and outputs corresponding dense disparity maps. First, a new vertical correction module is designed for predicting a correction map to compensate for the imperfect geometry alignment. Second, the left and right images, which are reconstructed based on the input image pair and corresponding disparities as well as the vertical correction maps, are regarded as the outputs of the generative term of the generative adversarial network (GAN). Then, the discriminator term of the GAN is used to distinguish the reconstructed images from the original inputs to force the generator to output increasingly realistic images. In addition, a residual mask is introduced to exclude pixels that conflict with the appearance of the original image in the loss calculation.
The proposed model is validated on the publicly available Stereo Correspondence and Reconstruction of Endoscopic Data (SCARED) dataset and the average MAE is 3.054 mm.
Our model can effectively handle imperfect rectified stereo images for depth estimation.
基于学习的方法在深度估计方面取得了显著的性能。然而,大多数自学习和无监督学习方法的前提是建立在严格的、几何对齐的立体校正基础上。当校正不准确时,这些方法的性能会下降。因此,我们探索了一种从立体图像中进行无监督深度估计的方法,该方法可以处理不完美的相机参数。
我们提出了一种无监督深度卷积网络,该网络以校正后的立体图像对作为输入,并输出相应的密集视差图。首先,设计了一个新的垂直校正模块来预测校正图,以补偿不完美的几何对齐。其次,基于输入图像对、相应视差以及垂直校正图重建的左右图像被视为生成对抗网络(GAN)生成项的输出。然后,GAN的判别项用于区分重建图像和原始输入,以迫使生成器输出越来越逼真的图像。此外,引入了一个残差掩码,以排除损失计算中与原始图像外观冲突的像素。
所提出的模型在公开可用的内窥镜数据立体匹配与重建(SCARED)数据集上得到验证,平均平均绝对误差(MAE)为3.054毫米。
我们的模型可以有效地处理不完美校正的立体图像进行深度估计。