College of Computer, National University of Defense Technology, Changsha 410073, China.
Sensors (Basel). 2021 Feb 18;21(4):1430. doi: 10.3390/s21041430.
Stereo matching is an important research field of computer vision. Due to the dimension of cost aggregation, current neural network-based stereo methods are difficult to trade-off speed and accuracy. To this end, we integrate fast 2D stereo methods with accurate 3D networks to improve performance and reduce running time. We leverage a 2D encoder-decoder network to generate a rough disparity map and construct a disparity range to guide the 3D aggregation network, which can significantly improve the accuracy and reduce the computational cost. We use a stacked hourglass structure to refine the disparity from coarse to fine. We evaluated our method on three public datasets. According to the KITTI official website results, Our network can generate an accurate result in 80 ms on a modern GPU. Compared to other 2D stereo networks (AANet, DeepPruner, FADNet, etc.), our network has a big improvement in accuracy. Meanwhile, it is significantly faster than other 3D stereo networks (5× than PSMNet, 7.5× than CSN and 22.5× than GANet, etc.), demonstrating the effectiveness of our method.
立体匹配是计算机视觉的一个重要研究领域。由于代价聚合的维度,当前基于神经网络的立体方法很难在速度和准确性之间进行权衡。为此,我们将快速的 2D 立体方法与准确的 3D 网络相结合,以提高性能并降低运行时间。我们利用 2D 编解码器网络生成粗略的视差图,并构建视差范围来指导 3D 聚合网络,这可以显著提高准确性并降低计算成本。我们使用堆叠沙漏结构从粗到细细化视差。我们在三个公共数据集上评估了我们的方法。根据 KITTI 官方网站的结果,我们的网络可以在现代 GPU 上 80ms 生成准确的结果。与其他 2D 立体网络(AANet、DeepPruner、FADNet 等)相比,我们的网络在准确性方面有了很大的提高。同时,它比其他 3D 立体网络(比 PSMNet 快 5 倍,比 CSN 快 7.5 倍,比 GANet 快 22.5 倍等)快得多,证明了我们方法的有效性。