Cao Hongpeng, Dirnberger Lukas, Bernardini Daniele, Piazza Cristina, Caccamo Marco
School of Engineering and Design, Technical University of Munich, Munich, Germany.
School of Computation, Information and Technology, Technical University of Munich, Munich, Germany.
Front Robot AI. 2023 Sep 27;10:1176492. doi: 10.3389/frobt.2023.1176492. eCollection 2023.
6D pose recognition has been a crucial factor in the success of robotic grasping, and recent deep learning based approaches have achieved remarkable results on benchmarks. However, their generalization capabilities in real-world applications remain unclear. To overcome this gap, we introduce 6IMPOSE, a novel framework for sim-to-real data generation and 6D pose estimation. 6IMPOSE consists of four modules: First, a data generation pipeline that employs the 3D software suite Blender to create synthetic RGBD image datasets with 6D pose annotations. Second, an annotated RGBD dataset of five household objects was generated using the proposed pipeline. Third, a real-time two-stage 6D pose estimation approach that integrates the object detector YOLO-V4 and a streamlined, real-time version of the 6D pose estimation algorithm PVN3D optimized for time-sensitive robotics applications. Fourth, a codebase designed to facilitate the integration of the vision system into a robotic grasping experiment. Our approach demonstrates the efficient generation of large amounts of photo-realistic RGBD images and the successful transfer of the trained inference model to robotic grasping experiments, achieving an overall success rate of 87% in grasping five different household objects from cluttered backgrounds under varying lighting conditions. This is made possible by fine-tuning data generation and domain randomization techniques and optimizing the inference pipeline, overcoming the generalization and performance shortcomings of the original PVN3D algorithm. Finally, we make the code, synthetic dataset, and all the pre-trained models available on GitHub.
6D位姿识别一直是机器人抓取成功的关键因素,最近基于深度学习的方法在基准测试中取得了显著成果。然而,它们在实际应用中的泛化能力仍不明确。为了克服这一差距,我们引入了6IMPOSE,这是一个用于模拟到真实数据生成和6D位姿估计的新颖框架。6IMPOSE由四个模块组成:第一,一个数据生成管道,它使用3D软件套件Blender创建带有6D位姿注释的合成RGBD图像数据集。第二,使用所提出的管道生成了一个包含五个 household对象的带注释的RGBD数据集。第三,一种实时两阶段6D位姿估计方法,该方法集成了目标检测器YOLO-V4和为对时间敏感的机器人应用优化的6D位姿估计算法PVN3D的简化实时版本。第四,一个代码库,旨在促进视觉系统集成到机器人抓取实验中。我们的方法展示了高效生成大量逼真的RGBD图像,以及将训练好的推理模型成功转移到机器人抓取实验中,在不同光照条件下从杂乱背景中抓取五个不同 household对象时,总体成功率达到87%。这是通过微调数据生成和域随机化技术以及优化推理管道实现的,克服了原始PVN3D算法的泛化和性能缺点。最后,我们在GitHub上提供了代码、合成数据集和所有预训练模型。