Suppr超能文献

多视图和 3D 可变形部件模型。

Multi-view and 3D deformable part models.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2015 Nov;37(11):2232-45. doi: 10.1109/TPAMI.2015.2408347.

Abstract

As objects are inherently 3D, they have been modeled in 3D in the early days of computer vision. Due to the ambiguities arising from mapping 2D features to 3D models, 3D object representations have been neglected and 2D feature-based models are the predominant paradigm in object detection nowadays. While such models have achieved outstanding bounding box detection performance, they come with limited expressiveness, as they are clearly limited in their capability of reasoning about 3D shape or viewpoints. In this work, we bring the worlds of 3D and 2D object representations closer, by building an object detector which leverages the expressive power of 3D object representations while at the same time can be robustly matched to image evidence. To that end, we gradually extend the successful deformable part model [1] to include viewpoint information and part-level 3D geometry information, resulting in several different models with different level of expressiveness. We end up with a 3D object model, consisting of multiple object parts represented in 3D and a continuous appearance model. We experimentally verify that our models, while providing richer object hypotheses than the 2D object models, provide consistently better joint object localization and viewpoint estimation than the state-of-the-art multi-view and 3D object detectors on various benchmarks (KITTI [2] , 3D object classes [3] , Pascal3D+ [4] , Pascal VOC 2007 [5] , EPFL multi-view cars[6] ).

摘要

由于物体本质上是三维的,因此在计算机视觉的早期阶段就已经对其进行了三维建模。由于将二维特征映射到三维模型会产生歧义,因此三维物体表示被忽视了,而二维特征基模型现在是物体检测中的主要范例。虽然这些模型在边界框检测性能方面取得了出色的成绩,但它们的表达能力有限,因为它们在推理三维形状或视角方面的能力显然有限。在这项工作中,我们通过构建一个利用三维物体表示的表达能力的物体检测器,使三维和二维物体表示的世界更加接近,同时可以稳健地与图像证据匹配。为此,我们逐渐扩展了成功的可变形部件模型[1],以包含视点信息和部件级别的三维几何信息,从而产生了具有不同表达能力的几个不同模型。最终得到了一个 3D 物体模型,由多个以 3D 表示的物体部件和连续的外观模型组成。我们通过实验验证了我们的模型,虽然它们提供了比二维物体模型更丰富的物体假设,但与各种基准(KITTI[2]、3D 物体类别[3]、Pascal3D+[4]、Pascal VOC 2007[5]、EPFL 多视图汽车[6])上的最新多视图和 3D 物体检测器相比,提供了更一致的物体定位和视角估计。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验