School of Computer Science and Technology, China University of Petroleum Huadong, Qingdao 266580, China.
Sensors (Basel). 2022 Apr 12;22(8):2936. doi: 10.3390/s22082936.
Distance estimation using a monocular camera is one of the most classic tasks for computer vision. Current monocular distance estimating methods need a lot of data collection or they produce imprecise results. In this paper, we propose a network for both object detection and distance estimation. A network-based on ShuffleNet and YOLO is used to detect an object, and a self-supervised learning network is used to estimate distance. We calibrated the camera, and the calibrated parameters were integrated into the overall network. We also analyzed the parameter variation of the camera pose. Further, a multi-scale resolution is applied to improve estimation accuracy by enriching the expression ability of depth information. We validated the results of object detection and distance estimation on the KITTI dataset and demonstrated that our approach is efficient and accurate. Finally, we construct a dataset and conduct similar experiments to verify the generality of the network in other scenarios. The results show that our proposed methods outperform alternative approaches on object-specific distance estimation.
使用单目相机进行距离估计是计算机视觉中最经典的任务之一。当前的单目距离估计方法需要大量的数据采集,或者产生不精确的结果。在本文中,我们提出了一种用于目标检测和距离估计的网络。基于 ShuffleNet 和 YOLO 的网络用于检测目标,而自监督学习网络用于估计距离。我们对相机进行了校准,并且将校准参数集成到了整个网络中。我们还分析了相机姿态的参数变化。此外,应用多尺度分辨率通过丰富深度信息的表达能力来提高估计精度。我们在 KITTI 数据集上验证了目标检测和距离估计的结果,并证明了我们的方法是高效和准确的。最后,我们构建了一个数据集并进行了类似的实验,以验证网络在其他场景中的通用性。结果表明,我们提出的方法在特定目标的距离估计方面优于其他方法。