Ren Xinlin, Wei Xingkui, Li Zhuwen, Fu Yanwei, Zhang Yinda, Xue Xiangyang
IEEE Trans Pattern Anal Mach Intell. 2024 Jun;46(6):4058-4074. doi: 10.1109/TPAMI.2023.3307567. Epub 2024 May 7.
Structure from Motion (SfM) is a fundamental computer vision problem which has not been well handled by deep learning. One of the promising solutions is to apply explicit structural constraint, e.g., 3D cost volume, into the neural network. Obtaining accurate camera poses from images alone can be challenging, especially with complicated environmental factors. Existing methods usually assume accurate camera poses from GT or other methods, which is unrealistic in practice and additional sensors are needed. In this work, we design a physical driven architecture, namely DeepSFM, inspired by traditional Bundle Adjustment, which consists of two cost volume based architectures to iteratively refine depth and pose. The explicit constraints on both depth and pose, when combined with the learning components, bring merit from both traditional BA and emerging deep learning technology. To speed up the learning and inference efficiency, we apply the Gated Recurrent Units (GRUs)-based depth and pose update modules with coarse to fine cost volumes on the iterative refinements. In addition, with the extended residual depth prediction module, our model can be adapted to dynamic scenes effectively. Extensive experiments on various datasets show that our model achieves state-of-the-art performance with superior robustness against challenging inputs.
运动结构(SfM)是一个基本的计算机视觉问题,深度学习尚未很好地解决该问题。一种有前景的解决方案是将显式结构约束(例如3D代价体积)应用于神经网络。仅从图像中获取准确的相机位姿可能具有挑战性,尤其是在存在复杂环境因素的情况下。现有方法通常假设从GT或其他方法获得准确的相机位姿,这在实际中是不现实的,并且需要额外的传感器。在这项工作中,我们受传统束调整的启发,设计了一种物理驱动架构,即深度SfM,它由两个基于代价体积的架构组成,以迭代地优化深度和位姿。深度和位姿上的显式约束与学习组件相结合,带来了传统束调整和新兴深度学习技术的优点。为了提高学习和推理效率,我们在迭代优化中应用了基于门控循环单元(GRU)的深度和位姿更新模块以及从粗到细的代价体积。此外,通过扩展的残差深度预测模块,我们的模型可以有效地适应动态场景。在各种数据集上进行的大量实验表明,我们的模型实现了最先进的性能,对具有挑战性的输入具有卓越的鲁棒性。