MVTec Software GmbH, Neherstr. 1, 81675 München, Germany.
IEEE Trans Pattern Anal Mach Intell. 2012 Oct;34(10):1902-14. doi: 10.1109/TPAMI.2011.266.
This paper describes an approach for recognizing instances of a 3D object in a single camera image and for determining their 3D poses. A hierarchical model is generated solely based on the geometry information of a 3D CAD model of the object. The approach does not rely on texture or reflectance information of the object's surface, making it useful for a wide range of industrial and robotic applications, e.g., bin-picking. A hierarchical view-based approach that addresses typical problems of previous methods is applied: It handles true perspective, is robust to noise, occlusions, and clutter to an extent that is sufficient for many practical applications, and is invariant to contrast changes. For the generation of this hierarchical model, a new model image generation technique by which scale-space effects can be taken into account is presented. The necessary object views are derived using a similarity-based aspect graph. The high robustness of an exhaustive search is combined with an efficient hierarchical search. The 3D pose is refined by using a least-squares adjustment that minimizes geometric distances in the image, yielding a position accuracy of up to 0.12 percent with respect to the object distance, and an orientation accuracy of up to 0.35 degree in our tests. The recognition time is largely independent of the complexity of the object, but depends mainly on the range of poses within which the object may appear in front of the camera. For efficiency reasons, the approach allows the restriction of the pose range depending on the application. Typical runtimes are in the range of a few hundred ms.
本文描述了一种在单目相机图像中识别三维物体实例并确定其三维姿态的方法。该方法仅基于物体三维 CAD 模型的几何信息生成层次模型。该方法不依赖于物体表面的纹理或反射信息,因此适用于广泛的工业和机器人应用,例如,料箱拣选。应用了一种基于分层视图的方法来解决以前方法的典型问题:它处理真实的透视效果,对噪声、遮挡和杂乱具有足够的鲁棒性,足以满足许多实际应用的需求,并且对对比度变化具有不变性。为了生成这种层次模型,提出了一种新的模型图像生成技术,可以考虑尺度空间效果。使用基于相似性的方面图来推导必要的物体视图。通过使用穷尽搜索的高鲁棒性与有效的分层搜索相结合,可以对 3D 姿态进行细化。通过最小化图像中的几何距离进行最小二乘调整,在我们的测试中,位置精度高达相对于物体距离的 0.12%,方向精度高达 0.35 度。识别时间在很大程度上与物体的复杂性无关,但主要取决于物体在相机前可能出现的姿态范围。出于效率原因,该方法允许根据应用程序限制姿态范围。典型的运行时间在几百毫秒范围内。