Weidenbach Maira, Laue Tim, Frese Udo
Faculty of Mathematics and Computer Science, University of Bremen, 28359 Bremen, Germany.
Sensors (Basel). 2024 Jan 10;24(2):432. doi: 10.3390/s24020432.
Robotic manipulation requires object pose knowledge for the objects of interest. In order to perform typical household chores, a robot needs to be able to estimate 6D poses for objects such as water glasses or salad bowls. This is especially difficult for glass objects, as for these, depth data are mostly disturbed, and in RGB images, occluded objects are still visible. Thus, in this paper, we propose to redefine the ground-truth for training RGB-based pose estimators in two ways: (a) we apply a transparency-aware multisegmentation, in which an image pixel can belong to more than one object, and (b) we use transparency-aware bounding boxes, which always enclose whole objects, even if parts of an object are formally occluded by another object. The latter approach ensures that the size and scale of an object remain more consistent across different images. We train our pose estimator, which was originally designed for opaque objects, with three different ground-truth types on the ClearPose dataset. Just by changing the training data to our transparency-aware segmentation, with no additional glass-specific feature changes in the estimator, the ADD-S AUC value increases by 4.3%. Such a multisegmentation can be created for every dataset that provides a 3D model of the object and its ground-truth pose.
机器人操作需要了解感兴趣物体的位姿信息。为了执行典型的家务劳动,机器人需要能够估计水杯或沙拉碗等物体的6D位姿。对于玻璃物体来说,这尤其困难,因为对于这些物体,深度数据大多受到干扰,而且在RGB图像中,被遮挡的物体仍然可见。因此,在本文中,我们建议通过两种方式重新定义用于训练基于RGB的位姿估计器的真实值:(a) 我们应用一种透明度感知的多分割方法,其中图像像素可以属于多个物体;(b) 我们使用透明度感知的边界框,即使一个物体的部分被另一个物体正式遮挡,该边界框也始终包围整个物体。后一种方法确保了物体的大小和比例在不同图像之间保持更一致。我们在ClearPose数据集上使用三种不同类型的真实值训练我们最初为不透明物体设计的位姿估计器。仅仅通过将训练数据改为我们的透明度感知分割,而估计器中不进行额外的特定于玻璃的特征更改,ADD-S AUC值就提高了4.3%。对于每个提供物体的3D模型及其真实位姿的数据集,都可以创建这样的多分割。