基于多任务 CNN 和图像相似性策略的视觉机器人重定位。

State Key Laboratory of Robotics and System, Harbin Institute of Technology, 92 Xidazhi Street, Harbin 150006, China.

MFIN, Faculty of Business and Economics, The University of Hong Kong, Pokfulam Road, Hong Kong 999077, China.

Sensors (Basel). 2020 Dec 4;20(23):6943. doi: 10.3390/s20236943.

The traditional CNN for 6D robot relocalization which outputs pose estimations does not interpret whether the model is making sensible predictions or just guessing at random. We found that convnet representations trained on classification problems generalize well to other tasks. Thus, we propose a multi-task CNN for robot relocalization, which can simultaneously perform pose regression and scene recognition. Scene recognition determines whether the input image belongs to the current scene in which the robot is located, not only reducing the error of relocalization but also making us understand with what confidence we can trust the prediction. Meanwhile, we found that when there is a large visual difference between testing images and training images, the pose precision becomes low. Based on this, we present the dual-level image-similarity strategy (DLISS), which consists of two levels: initial level and iteration-level. The initial level performs feature vector clustering in the training set and feature vector acquisition in testing images. The iteration level, namely, the PSO-based image-block selection algorithm, can select the testing images which are the most similar to training images based on the initial level, enabling us to gain higher pose accuracy in testing set. Our method considers both the accuracy and the robustness of relocalization, and it can operate indoors and outdoors in real time, taking at most 27 ms per frame to compute. Finally, we used the Microsoft 7Scenes dataset and the Cambridge Landmarks dataset to evaluate our method. It can obtain approximately 0.33 m and 7.51∘ accuracy on 7Scenes dataset, and get approximately 1.44 m and 4.83∘ accuracy on the Cambridge Landmarks dataset. Compared with PoseNet, our CNN reduced the average positional error by 25% and the average angular error by 27.79% on 7Scenes dataset, and reduced the average positional error by 40% and the average angular error by 28.55% on the Cambridge Landmarks dataset. We show that our multi-task CNN can localize from high-level features and is robust to images which are not in the current scene. Furthermore, we show that our multi-task CNN gets higher accuracy of relocalization by using testing images obtained by DLISS.

传统的用于 6D 机器人重定位的卷积神经网络（CNN）输出姿势估计值，但无法解释模型是做出了合理的预测还是只是随机猜测。我们发现，在分类问题上训练的 convnet 表示可以很好地推广到其他任务。因此，我们提出了一种用于机器人重定位的多任务 CNN，它可以同时执行姿势回归和场景识别。场景识别确定输入图像是否属于机器人所在的当前场景，不仅可以减少重定位的误差，还可以让我们了解我们可以信任预测的置信度。同时，我们发现当测试图像与训练图像之间存在较大的视觉差异时，姿势精度会降低。基于此，我们提出了双级图像相似度策略（DLISS），它由两个级别组成：初始级别和迭代级别。初始级别在训练集中执行特征向量聚类，并在测试图像中获取特征向量。迭代级别，即基于 PSO 的图像块选择算法，可以根据初始级别选择与训练图像最相似的测试图像，从而使我们在测试集中获得更高的姿势精度。我们的方法既考虑了重定位的准确性又考虑了其鲁棒性，它可以在室内和室外实时运行，每帧最多需要 27 毫秒的计算时间。最后，我们使用 Microsoft 7Scenes 数据集和剑桥地标数据集来评估我们的方法。它在 7Scenes 数据集上可以获得约 0.33 米和 7.51∘的精度，在剑桥地标数据集上可以获得约 1.44 米和 4.83∘的精度。与 PoseNet 相比，我们的 CNN 在 7Scenes 数据集上平均位置误差降低了 25%，平均角度误差降低了 27.79%，在剑桥地标数据集上平均位置误差降低了 40%，平均角度误差降低了 28.55%。我们表明，我们的多任务 CNN 可以从高层特征进行定位，并且对不在当前场景的图像具有鲁棒性。此外，我们表明，我们的多任务 CNN 通过使用 DLISS 获得的测试图像可以获得更高的重定位精度。

相似文献

Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy.

Sensors (Basel). 2020 Dec 4;20(23):6943. doi: 10.3390/s20236943.

Performances of the LBP Based Algorithm over CNN Models for Detecting Crops and Weeds with Similar Morphologies.

Sensors (Basel). 2020 Apr 14;20(8):2193. doi: 10.3390/s20082193.

A CNN-based prototype method of unstructured surgical state perception and navigation for an endovascular surgery robot.

Med Biol Eng Comput. 2019 Sep;57(9):1875-1887. doi: 10.1007/s11517-019-02002-0. Epub 2019 Jun 20.

Aerial scene understanding in the wild: Multi-scene recognition via prototype-based memory networks.

ISPRS J Photogramm Remote Sens. 2021 Jul;177:89-102. doi: 10.1016/j.isprsjprs.2021.04.006.

Indoor Visual Positioning Aided by CNN-Based Image Retrieval: Training-Free, 3D Modeling-Free.

Sensors (Basel). 2018 Aug 16;18(8):2692. doi: 10.3390/s18082692.

Position tracking of moving liver lesion based on real-time registration between 2D ultrasound and 3D preoperative images.

Med Phys. 2015 Jan;42(1):335-47. doi: 10.1118/1.4903945.

HCP: A Flexible CNN Framework for Multi-label Image Classification.

IEEE Trans Pattern Anal Mach Intell. 2016 Sep 1;38(9):1901-1907. doi: 10.1109/TPAMI.2015.2491929. Epub 2015 Oct 26.

RotationNet for Joint Object Categorization and Unsupervised Pose Estimation from Multi-View Images.

IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):269-283. doi: 10.1109/TPAMI.2019.2922640. Epub 2020 Dec 4.

Fast learning of fiber orientation distribution function for MR tractography using convolutional neural network.

Med Phys. 2019 Jul;46(7):3101-3116. doi: 10.1002/mp.13555. Epub 2019 May 11.

Electromagnetic Modulation Signal Classification Using Dual-Modal Feature Fusion CNN.

Entropy (Basel). 2022 May 15;24(5):700. doi: 10.3390/e24050700.

引用本文的文献

Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots.

Sci Data. 2025 May 19;12(1):817. doi: 10.1038/s41597-025-05124-3.

本文引用的文献

A Robust Indoor Localization System Integrating Visual Localization Aided by CNN-Based Image Retrieval with Monte Carlo Localization.

Sensors (Basel). 2019 Jan 10;19(2):249. doi: 10.3390/s19020249.

Visual SLAM for Handheld Monocular Endoscope.

IEEE Trans Med Imaging. 2014 Jan;33(1):135-46. doi: 10.1109/TMI.2013.2282997. Epub 2013 Sep 20.

Automatic Relocalization and Loop Closing for Real-Time Monocular SLAM.

IEEE Trans Pattern Anal Mach Intell. 2011 Sep;33(9):1699-712. doi: 10.1109/TPAMI.2011.41. Epub 2011 Mar 3.

Long short-term memory.

Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy.

Sensors (Basel). 2020 Dec 4;20(23):6943. doi: 10.3390/s20236943.

Performances of the LBP Based Algorithm over CNN Models for Detecting Crops and Weeds with Similar Morphologies.

Sensors (Basel). 2020 Apr 14;20(8):2193. doi: 10.3390/s20082193.

A CNN-based prototype method of unstructured surgical state perception and navigation for an endovascular surgery robot.

Med Biol Eng Comput. 2019 Sep;57(9):1875-1887. doi: 10.1007/s11517-019-02002-0. Epub 2019 Jun 20.

Aerial scene understanding in the wild: Multi-scene recognition via prototype-based memory networks.

ISPRS J Photogramm Remote Sens. 2021 Jul;177:89-102. doi: 10.1016/j.isprsjprs.2021.04.006.

Indoor Visual Positioning Aided by CNN-Based Image Retrieval: Training-Free, 3D Modeling-Free.

Sensors (Basel). 2018 Aug 16;18(8):2692. doi: 10.3390/s18082692.

Position tracking of moving liver lesion based on real-time registration between 2D ultrasound and 3D preoperative images.

Med Phys. 2015 Jan;42(1):335-47. doi: 10.1118/1.4903945.

HCP: A Flexible CNN Framework for Multi-label Image Classification.

IEEE Trans Pattern Anal Mach Intell. 2016 Sep 1;38(9):1901-1907. doi: 10.1109/TPAMI.2015.2491929. Epub 2015 Oct 26.

RotationNet for Joint Object Categorization and Unsupervised Pose Estimation from Multi-View Images.

IEEE Trans Pattern Anal Mach Intell. 2021 Jan;43(1):269-283. doi: 10.1109/TPAMI.2019.2922640. Epub 2020 Dec 4.

Fast learning of fiber orientation distribution function for MR tractography using convolutional neural network.

Med Phys. 2019 Jul;46(7):3101-3116. doi: 10.1002/mp.13555. Epub 2019 May 11.

Electromagnetic Modulation Signal Classification Using Dual-Modal Feature Fusion CNN.

Entropy (Basel). 2022 May 15;24(5):700. doi: 10.3390/e24050700.

引用本文的文献

Multi-Domain Indoor Dataset for Visual Place Recognition and Anomaly Detection by Mobile Robots.

Sci Data. 2025 May 19;12(1):817. doi: 10.1038/s41597-025-05124-3.

本文引用的文献

A Robust Indoor Localization System Integrating Visual Localization Aided by CNN-Based Image Retrieval with Monte Carlo Localization.

Sensors (Basel). 2019 Jan 10;19(2):249. doi: 10.3390/s19020249.

Visual SLAM for Handheld Monocular Endoscope.

IEEE Trans Med Imaging. 2014 Jan;33(1):135-46. doi: 10.1109/TMI.2013.2282997. Epub 2013 Sep 20.

Automatic Relocalization and Loop Closing for Real-Time Monocular SLAM.

IEEE Trans Pattern Anal Mach Intell. 2011 Sep;33(9):1699-712. doi: 10.1109/TPAMI.2011.41. Epub 2011 Mar 3.

Long short-term memory.

Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

Visual Robot Relocalization Based on Multi-Task CNN and Image-Similarity Strategy.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献