Chang Jiahao, He Jianfeng, Zhang Tianzhu, Yu Jiyang, Wu Feng
IEEE Trans Image Process. 2024;33:753-766. doi: 10.1109/TIP.2023.3347929. Epub 2024 Jan 12.
Recent learning-based methods demonstrate their strong ability to estimate depth for multi-view stereo reconstruction. However, most of these methods directly extract features via regular or deformable convolutions, and few works consider the alignment of the receptive fields between views while constructing the cost volume. Through analyzing the constraint and inference of previous MVS networks, we find that there are still some shortcomings that hinder the performance. To deal with the above issues, we propose an Epipolar-Guided Multi-View Stereo Network with Interval-Aware Label (EI-MVSNet), which includes an epipolar-guided volume construction module and an interval-aware depth estimation module in a unified architecture for MVS. The proposed EI-MVSNet enjoys several merits. First, in the epipolar-guided volume construction module, we construct cost volume with features from aligned receptive fields between different pairs of reference and source images via epipolar-guided convolutions, which take rotation and scale changes into account. Second, in the interval-aware depth estimation module, we attempt to supervise the cost volume directly and make depth estimation independent of extraneous values by perceiving the upper and lower boundaries, which can achieve fine-grained predictions and enhance the reasoning ability of the network. Extensive experimental results on two standard benchmarks demonstrate that our EI-MVSNet performs favorably against state-of-the-art MVS methods. Specifically, our EI-MVSNet ranks 1 on both intermediate and advanced subsets of the Tanks and Temples benchmark, which verifies the high precision and strong robustness of our model.
最近基于学习的方法展示了其在多视图立体重建中估计深度的强大能力。然而,这些方法大多通过常规或可变形卷积直接提取特征,很少有工作在构建代价体时考虑视图之间感受野的对齐。通过分析先前多视图立体网络的约束和推理,我们发现仍存在一些阻碍性能的缺点。为了解决上述问题,我们提出了一种具有区间感知标签的极线引导多视图立体网络(EI-MVSNet),它在一个统一的多视图立体架构中包括一个极线引导的体构建模块和一个区间感知深度估计模块。所提出的EI-MVSNet具有几个优点。首先,在极线引导的体构建模块中,我们通过考虑旋转和尺度变化的极线引导卷积,利用不同参考图像和源图像对之间对齐的感受野中的特征来构建代价体。其次,在区间感知深度估计模块中,我们尝试直接监督代价体,并通过感知上下边界使深度估计独立于无关值,这可以实现细粒度预测并增强网络的推理能力。在两个标准基准上的大量实验结果表明,我们的EI-MVSNet优于现有的多视图立体方法。具体而言,我们的EI-MVSNet在Tanks and Temples基准的中级和高级子集中均排名第一,这验证了我们模型的高精度和强鲁棒性。