Suppr超能文献

在不受控环境下从单张RGB图像估计6自由度物体姿态和焦距

6DoF Object Pose and Focal Length Estimation from Single RGB Images in Uncontrolled Environments.

作者信息

Manawadu Mayura, Park Soon-Yong

机构信息

Graduate School of Electronic and Electrical Engineering, Kyungpook National University, Daegu 41566, Republic of Korea.

出版信息

Sensors (Basel). 2024 Aug 23;24(17):5474. doi: 10.3390/s24175474.

Abstract

Accurate 6DoF (degrees of freedom) pose and focal length estimation are important in extended reality (XR) applications, enabling precise object alignment and projection scaling, thereby enhancing user experiences. This study focuses on improving 6DoF pose estimation using single RGB images of unknown camera metadata. Estimating the 6DoF pose and focal length from an uncontrolled RGB image, obtained from the internet, is challenging because it often lacks crucial metadata. Existing methods such as FocalPose and Focalpose++ have made progress in this domain but still face challenges due to the projection scale ambiguity between the translation of an object along the z-axis (tz) and the camera's focal length. To overcome this, we propose a two-stage strategy that decouples the projection scaling ambiguity in the estimation of z-axis translation and focal length. In the first stage, tz is set arbitrarily, and we predict all the other pose parameters and focal length relative to the fixed tz. In the second stage, we predict the true value of tz while scaling the focal length based on the tz update. The proposed two-stage method reduces projection scale ambiguity in RGB images and improves pose estimation accuracy. The iterative update rules constrained to the first stage and tailored loss functions including Huber loss in the second stage enhance the accuracy in both 6DoF pose and focal length estimation. Experimental results using benchmark datasets show significant improvements in terms of median rotation and translation errors, as well as better projection accuracy compared to the existing state-of-the-art methods. In an evaluation across the Pix3D datasets (chair, sofa, table, and bed), the proposed two-stage method improves projection accuracy by approximately 7.19%. Additionally, the incorporation of Huber loss resulted in a significant reduction in translation and focal length errors by 20.27% and 6.65%, respectively, in comparison to the Focalpose++ method.

摘要

精确的六自由度(6DoF)姿态和焦距估计在扩展现实(XR)应用中至关重要,它能实现精确的物体对齐和投影缩放,从而提升用户体验。本研究聚焦于利用相机元数据未知的单张RGB图像改进六自由度姿态估计。从互联网获取的未经控制的RGB图像中估计六自由度姿态和焦距具有挑战性,因为它通常缺乏关键元数据。诸如FocalPose和Focalpose++等现有方法在该领域取得了进展,但由于物体沿z轴平移(tz)与相机焦距之间的投影比例模糊性,仍然面临挑战。为克服这一问题,我们提出了一种两阶段策略,在z轴平移和焦距估计中解耦投影比例模糊性。在第一阶段,任意设置tz,然后预测相对于固定tz的所有其他姿态参数和焦距。在第二阶段,我们预测tz的真实值,同时根据tz更新缩放焦距。所提出的两阶段方法减少了RGB图像中的投影比例模糊性,提高了姿态估计精度。在第一阶段的迭代更新规则以及第二阶段包含Huber损失的定制损失函数提高了六自由度姿态和焦距估计的准确性。使用基准数据集的实验结果表明,与现有的最先进方法相比,在中值旋转和平移误差方面有显著改进,并且投影精度更高。在对Pix3D数据集(椅子、沙发、桌子和床)的评估中,所提出的两阶段方法将投影精度提高了约7.19%。此外,与Focalpose++方法相比,引入Huber损失分别使平移和焦距误差显著降低了20.27%和6.65%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/525f/11398113/e2cabc8fb895/sensors-24-05474-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验