Zhang Xudong, Zhao Baigan, Yao Jiannan, Wu Guoqing
School of Information Science and Technology, Nantong University, Nantong 226019, China.
School of Mechanical Engineering, Nantong University, Nantong 226019, China.
Sensors (Basel). 2023 Jun 4;23(11):5329. doi: 10.3390/s23115329.
This paper presents a novel unsupervised learning framework for estimating scene depth and camera pose from video sequences, fundamental to many high-level tasks such as 3D reconstruction, visual navigation, and augmented reality. Although existing unsupervised methods have achieved promising results, their performance suffers in challenging scenes such as those with dynamic objects and occluded regions. As a result, multiple mask technologies and geometric consistency constraints are adopted in this research to mitigate their negative impacts. Firstly, multiple mask technologies are used to identify numerous outliers in the scene, which are excluded from the loss computation. In addition, the identified outliers are employed as a supervised signal to train a mask estimation network. The estimated mask is then utilized to preprocess the input to the pose estimation network, mitigating the potential adverse effects of challenging scenes on pose estimation. Furthermore, we propose geometric consistency constraints to reduce the sensitivity of illumination changes, which act as additional supervised signals to train the network. Experimental results on the KITTI dataset demonstrate that our proposed strategies can effectively enhance the model's performance, outperforming other unsupervised methods.
本文提出了一种新颖的无监督学习框架,用于从视频序列中估计场景深度和相机位姿,这对于诸如三维重建、视觉导航和增强现实等许多高级任务至关重要。尽管现有的无监督方法已经取得了有前景的成果,但它们在诸如存在动态物体和遮挡区域的具有挑战性的场景中的性能会受到影响。因此,本研究采用了多种掩码技术和几何一致性约束来减轻其负面影响。首先,使用多种掩码技术来识别场景中的大量异常值,并将其从损失计算中排除。此外,将识别出的异常值用作监督信号来训练掩码估计网络。然后,利用估计出的掩码对姿态估计网络的输入进行预处理,减轻具有挑战性的场景对姿态估计的潜在不利影响。此外,我们提出几何一致性约束以降低光照变化的敏感性,这些约束作为额外的监督信号来训练网络。在KITTI数据集上的实验结果表明,我们提出的策略可以有效地提高模型的性能,优于其他无监督方法。