Vázquez-Martín Ricardo, Bandera Antonio
Centro Andaluz de Innovación y Tecnologías de la Información y las Comunicaciones (CITIC), Málaga, Spain.
Cogn Process. 2012 Aug;13 Suppl 1:S351-4. doi: 10.1007/s10339-012-0496-2.
Monocular approaches to simultaneous localization and mapping (SLAM) have recently addressed with success the challenging problem of the fast computation of dense reconstructions from a single, moving camera. Thus, if these approaches initially relied on the detection of a reduced set of interest points to estimate the camera position and the map, they are currently able to reconstruct dense maps from a handheld camera while the camera coordinates are simultaneously computed. However, these maps of 3-dimensional points usually remain meaningless, that is, with no memorable items and without providing a way of encoding spatial relationships between objects and paths. In humans and mobile robotics, landmarks play a key role in the internalization of a spatial representation of an environment. They are memorable cues that can serve to define a region of the space or the location of other objects. In a topological representation of the space, landmarks can be identified and located according to its structural, perceptive or semantic significance and distinctiveness. But on the other hand, landmarks may be difficult to be located in a metric representation of the space. Restricted to the domain of visual landmarks, this work describes an approach where the map resulting from a point-based, monocular SLAM is annotated with the semantic information provided by a set of distinguished landmarks. Both features are obtained from the image. Hence, they can be linked by associating to each landmark all those point-based features that are superimposed to the landmark in a given image (key-frame). Visual landmarks will be obtained by means of an object-based, bottom-up attention mechanism, which will extract from the image a set of proto-objects. These proto-objects could not be always associated with natural objects, but they will typically constitute significant parts of these scene objects and can be appropriately annotated with semantic information. Moreover, they will be affine covariant regions, that is, they will be invariant to affine transformation, being detected under different viewing conditions (view-point angle, rotation, scale, etc.). Monocular SLAM will be solved using the accurate parallel tracking and mapping (PTAM) framework by Klein and Murray in Proceedings of IEEE/ACM international symposium on mixed and augmented reality, 2007.
单目同时定位与地图构建(SLAM)方法最近成功解决了从单个移动相机快速计算密集重建这一具有挑战性的问题。因此,尽管这些方法最初依赖于检测一组简化的兴趣点来估计相机位置和地图,但目前它们能够在计算相机坐标的同时,从手持相机重建密集地图。然而,这些三维点的地图通常仍然没有意义,也就是说,没有值得记忆的元素,也没有提供一种对物体与路径之间空间关系进行编码的方法。在人类和移动机器人中,地标在环境空间表征的内化过程中起着关键作用。它们是可记忆的线索,可用于定义空间区域或其他物体的位置。在空间的拓扑表征中,地标可根据其结构、感知或语义意义及独特性来识别和定位。但另一方面,地标在空间的度量表征中可能难以定位。限于视觉地标的领域,这项工作描述了一种方法,其中基于点的单目SLAM生成的地图用一组独特地标提供的语义信息进行标注。这两种特征都从图像中获取。因此,通过将给定图像(关键帧)中叠加在地标上的所有基于点的特征与每个地标相关联,它们可以被链接起来。视觉地标将通过基于对象的自底向上注意力机制获得,该机制将从图像中提取一组原型对象。这些原型对象并不总是与自然对象相关联,但它们通常会构成这些场景对象的重要部分,并可以用语义信息进行适当标注。此外,它们将是仿射协变区域,也就是说,它们对仿射变换具有不变性,能在不同的观察条件(视角、旋转、缩放等)下被检测到。单目SLAM将使用Klein和Murray在2007年IEEE/ACM混合与增强现实国际研讨会论文集中提出的精确并行跟踪与映射(PTAM)框架来解决。