Moon Sungho, Lee Myungho
Department of Information Convergence Engineering, Pusan National University, Busan 46241, Republic of Korea.
School of Computer Science and Engineering, Pusan National University, Busan 46241, Republic of Korea.
Sensors (Basel). 2024 Jan 26;24(3):816. doi: 10.3390/s24030816.
Visual localization refers to the process of determining an observer's pose by analyzing the spatial relationships between a query image and a pre-existing set of images. In this procedure, matched visual features between images are identified and utilized for pose estimation; consequently, the accuracy of the estimation heavily relies on the precision of feature matching. Incorrect feature matchings, such as those between different objects and/or different points within an object in an image, should thus be avoided. In this paper, our initial evaluation focused on gauging the reliability of each object class within image datasets concerning pose estimation accuracy. This assessment revealed the building class to be reliable, while humans exhibited unreliability across diverse locations. The subsequent study delved deeper into the degradation of pose estimation accuracy by artificially increasing the proportion of the unreliable object-humans. The findings revealed a noteworthy decline started when the average proportion of the humans in the images exceeded 20%. We discuss the results and implications for dataset construction for visual localization.
视觉定位是指通过分析查询图像与预先存在的一组图像之间的空间关系来确定观察者姿态的过程。在此过程中,识别图像之间匹配的视觉特征并将其用于姿态估计;因此,估计的准确性在很大程度上依赖于特征匹配的精度。因此,应避免不正确的特征匹配,例如不同对象之间和/或图像中对象内不同点之间的匹配。在本文中,我们的初步评估重点是衡量图像数据集中每个对象类别在姿态估计准确性方面的可靠性。该评估表明建筑物类别是可靠的,而人类在不同位置表现出不可靠性。随后的研究通过人为增加不可靠对象——人类的比例,更深入地探讨了姿态估计准确性的下降情况。研究结果表明,当图像中人类的平均比例超过20%时,姿态估计准确性开始出现显著下降。我们讨论了这些结果以及对视觉定位数据集构建的影响。