一种用于复杂家居场景中基于立体视觉的物体姿态和尺寸估计的可变光照模型方法。

A Variable Photo-Model Method for Object Pose and Size Estimation with Stereo Vision in a Complex Home Scene.

作者信息

Tian Hongzhi, Wang Jirong

机构信息

College of Mechanical and Electrical Engineering, Qingdao University, Qingdao 266071, China.

Weihai Innovation Research Institute, Qingdao University, Weihai 264200, China.

出版信息

Sensors (Basel). 2023 Aug 3;23(15):6924. doi: 10.3390/s23156924.

DOI:10.3390/s23156924

PMID:37571707

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10422454/

Abstract

Model-based stereo vision methods can estimate the 6D poses of rigid objects. They can help robots to achieve a target grip in complex home environments. This study presents a novel approach, called the variable photo-model method, to estimate the pose and size of an unknown object using a single photo of the same category. By employing a pre-trained You Only Look Once (YOLO) v4 weight for object detection and 2D model generation in the photo, the method converts the segmented 2D photo-model into 3D flat photo-models assuming different sizes and poses. Through perspective projection and model matching, the method finds the best match between the model and the actual object in the captured stereo images. The matching fitness function is optimized using a genetic algorithm (GA). Unlike data-driven approaches, this approach does not require multiple photos or pre-training time for single object pose recognition, making it more versatile. Indoor experiments demonstrate the effectiveness of the variable photo-model method in estimating the pose and size of the target objects within the same class. The findings of this study have practical implications for object detection prior to robotic grasping, particularly due to its ease of application and the limited data required.

摘要

基于模型的立体视觉方法可以估计刚性物体的6D姿态。它们可以帮助机器人在复杂的家庭环境中实现目标抓取。本研究提出了一种新颖的方法，称为可变照片模型法，用于使用同一类别的单张照片估计未知物体的姿态和尺寸。通过采用预训练的You Only Look Once (YOLO) v4权重进行物体检测和照片中的二维模型生成，该方法将分割后的二维照片模型转换为假设不同尺寸和姿态的三维平面照片模型。通过透视投影和模型匹配，该方法在捕获的立体图像中找到模型与实际物体之间的最佳匹配。使用遗传算法(GA)优化匹配适应度函数。与数据驱动方法不同，该方法不需要多张照片或单个物体姿态识别的预训练时间，使其更具通用性。室内实验证明了可变照片模型法在估计同一类目标物体的姿态和尺寸方面的有效性。本研究的结果对机器人抓取前的物体检测具有实际意义，特别是由于其易于应用和所需数据有限。