Hu Hanjiang, Qiao Zhijian, Cheng Ming, Liu Zhe, Wang Hesheng
IEEE Trans Image Process. 2021;30:1342-1353. doi: 10.1109/TIP.2020.3043875. Epub 2020 Dec 23.
Long-Term visual localization under changing environments is a challenging problem in autonomous driving and mobile robotics due to season, illumination variance, etc. Image retrieval for localization is an efficient and effective solution to the problem. In this paper, we propose a novel multi-task architecture to fuse the geometric and semantic information into the multi-scale latent embedding representation for visual place recognition. To use the high-quality ground truths without any human effort, the effective multi-scale feature discriminator is proposed for adversarial training to achieve the domain adaptation from synthetic virtual KITTI dataset to real-world KITTI dataset. The proposed approach is validated on the Extended CMU-Seasons dataset and Oxford RobotCar dataset through a series of crucial comparison experiments, where our performance outperforms state-of-the-art baselines for retrieval-based localization and large-scale place recognition under the challenging environment.
由于季节、光照变化等因素,在不断变化的环境中进行长期视觉定位是自动驾驶和移动机器人领域中一个具有挑战性的问题。用于定位的图像检索是解决该问题的一种高效且有效的方法。在本文中,我们提出了一种新颖的多任务架构,将几何和语义信息融合到用于视觉场所识别的多尺度潜在嵌入表示中。为了无需人工干预即可使用高质量的地面真值,提出了有效的多尺度特征判别器用于对抗训练,以实现从合成虚拟KITTI数据集到真实世界KITTI数据集的域适应。通过一系列关键的对比实验,在扩展的CMU-Seasons数据集和牛津RobotCar数据集上验证了所提出的方法,在具有挑战性的环境下,我们的性能优于基于检索的定位和大规模场所识别的当前最先进基线。