Liang Zhengfa, Guo Yulan, Feng Yiliu, Chen Wei, Qiao Linbo, Zhou Li, Zhang Jianfeng, Liu Hengzhu
IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):300-315. doi: 10.1109/TPAMI.2019.2928550. Epub 2020 Dec 4.
For CNNs based stereo matching methods, cost volumes play an important role in achieving good matching accuracy. In this paper, we present an end-to-end trainable convolution neural network to fully use cost volumes for stereo matching. Our network consists of three sub-modules, i.e., shared feature extraction, initial disparity estimation, and disparity refinement. Cost volumes are calculated at multiple levels using the shared features, and are used in both initial disparity estimation and disparity refinement sub-modules. To improve the efficiency of disparity refinement, multi-scale feature constancy is introduced to measure the correctness of the initial disparity in feature space. These sub-modules of our network are tightly-coupled, making it compact and easy to train. Moreover, we investigate the problem of developing a robust model to perform well across multiple datasets with different characteristics. We achieve this by introducing a two-stage finetuning scheme to gently transfer the model to target datasets. Specifically, in the first stage, the model is finetuned using both a large synthetic dataset and the target datasets with a relatively large learning rate, while in the second stage the model is trained using only the target datasets with a small learning rate. The proposed method is tested on several benchmarks including the Middlebury 2014, KITTI 2015, ETH3D 2017, and SceneFlow datasets. Experimental results show that our method achieves the state-of-the-art performance on all the datasets. The proposed method also won the 1st prize on the Stereo task of Robust Vision Challenge 2018.
对于基于卷积神经网络的立体匹配方法,代价体在实现良好的匹配精度方面起着重要作用。在本文中,我们提出了一种端到端可训练的卷积神经网络,以充分利用代价体进行立体匹配。我们的网络由三个子模块组成,即共享特征提取、初始视差估计和视差细化。使用共享特征在多个层次上计算代价体,并将其用于初始视差估计和视差细化子模块。为了提高视差细化的效率,引入了多尺度特征一致性来衡量特征空间中初始视差的正确性。我们网络的这些子模块紧密耦合,使其紧凑且易于训练。此外,我们研究了开发一个鲁棒模型以在具有不同特征的多个数据集上都能良好运行的问题。我们通过引入两阶段微调方案来将模型平缓地迁移到目标数据集来实现这一点。具体来说,在第一阶段,使用一个大型合成数据集和目标数据集以相对较大的学习率对模型进行微调,而在第二阶段,仅使用目标数据集以较小的学习率对模型进行训练。所提出的方法在包括米德尔伯里2014、KITTI 2015、ETH3D 2017和SceneFlow数据集在内的几个基准测试中进行了测试。实验结果表明,我们的方法在所有数据集上都达到了当前最优性能。所提出的方法还在2018年鲁棒视觉挑战赛的立体任务中获得了一等奖。