College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 45000, China.
Math Biosci Eng. 2020 Nov 6;17(6):7787-7803. doi: 10.3934/mbe.2020396.
Deep end-to-end learning based stereo matching methods have achieved great success as witnessed by the leaderboards across different benchmarking datasets. Depth information in stereo vision systems are obtained by a dense and accurate disparity map, which is computed by a robust stereo matching algorithm. However, previous works adopt network layer with the same size to train the feature parameters and get an unsatisfactory efficiency, which cannot be satisfied for the real scenarios by existing methods. In this paper, we present an end-to-end stereo matching algorithm based on "downsize" convolutional neural network (CNN) for autonomous driving scenarios. Firstly, the road images are feed into the designed CNN to get the depth information. And then the "downsize" full-connection layer combined with subsequent network optimization is employed to improve the accuracy of the algorithm. Finally, the improved loss function is utilized to approximate the similarity of positive and negative samples in a more relaxed constraint to improve the matching effect of the output. The loss function error of the proposed method for KITTI 2012 and KITTI 2015 datasets are reduced to 2.62 and 3.26% respectively, which also reduces the runtime of the proposed algorithm. Experimental results illustrate that the proposed end-to-end algorithm can obtain a dense disparity map and the corresponding depth information can be used for the binocular vision system in autonomous driving scenarios. In addition, our method also achieves better performance when the size of the network is compressed compared with previous methods.
基于深度学习的立体匹配方法在不同基准数据集的排行榜上取得了巨大成功。立体视觉系统中的深度信息是通过密集且精确的视差图获得的,而视差图是由鲁棒的立体匹配算法计算得到的。然而,之前的工作采用相同大小的网络层来训练特征参数,导致效率不高,现有的方法无法满足实际场景的需求。在本文中,我们提出了一种基于“降维”卷积神经网络(CNN)的端到端立体匹配算法,用于自动驾驶场景。首先,将道路图像输入到设计的 CNN 中以获取深度信息。然后,采用“降维”全连接层结合后续的网络优化来提高算法的准确性。最后,利用改进的损失函数来近似正样本和负样本的相似度,在更宽松的约束下提高输出的匹配效果。在 KITTI 2012 和 KITTI 2015 数据集上,我们提出的方法的损失函数误差分别降低到 2.62%和 3.26%,同时也降低了算法的运行时间。实验结果表明,所提出的端到端算法可以获得密集的视差图,并且相应的深度信息可以用于自动驾驶场景中的双目视觉系统。此外,与之前的方法相比,我们的方法在压缩网络规模时也能取得更好的性能。