一种用于同时估计物体姿态和身份的神经动力学架构。

A Neural-Dynamic Architecture for Concurrent Estimation of Object Pose and Identity.

作者信息

Lomp Oliver, Faubel Christian, Schöner Gregor

机构信息

Institut für Neuroinformatik, Ruhr-University Bochum, Bochum, Germany.

出版信息

Front Neurorobot. 2017 Apr 28;11:23. doi: 10.3389/fnbot.2017.00023. eCollection 2017.

DOI:10.3389/fnbot.2017.00023

PMID:28503145

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5408094/

Abstract

Handling objects or interacting with a human user about objects on a shared tabletop requires that objects be identified after learning from a small number of views and that object pose be estimated. We present a neurally inspired architecture that learns object instances by storing features extracted from a single view of each object. Input features are color and edge histograms from a localized area that is updated during processing. The system finds the best-matching view for the object in a novel input image while concurrently estimating the object's pose, aligning the learned view with current input. The system is based on neural dynamics, computationally operating in real time, and can handle dynamic scenes directly off live video input. In a scenario with 30 everyday objects, the system achieves recognition rates of 87.2% from a single training view for each object, while also estimating pose quite precisely. We further demonstrate that the system can track moving objects, and that it can segment the visual array, selecting and recognizing one object while suppressing input from another known object in the immediate vicinity. Evaluation on the COIL-100 dataset, in which objects are depicted from different viewing angles, revealed recognition rates of 91.1% on the first 30 objects, each learned from four training views.

摘要

在共享桌面上处理物体或与人类用户就桌面上的物体进行交互，需要在从少量视图中学习后识别物体，并估计物体姿态。我们提出了一种受神经启发的架构，通过存储从每个物体的单个视图中提取的特征来学习物体实例。输入特征是来自处理过程中更新的局部区域的颜色和边缘直方图。该系统在新颖的输入图像中找到与物体最匹配的视图，同时估计物体的姿态，将学习到的视图与当前输入对齐。该系统基于神经动力学，能够实时进行计算操作，并且可以直接根据实时视频输入处理动态场景。在一个包含30个日常物体的场景中，该系统对每个物体从单个训练视图中实现了87.2%的识别率，同时姿态估计也相当精确。我们进一步证明该系统可以跟踪移动物体，并且可以分割视觉阵列，选择并识别一个物体，同时抑制紧邻的另一个已知物体的输入。在COIL - 100数据集上进行评估，其中物体从不同视角描绘，在前30个物体上，每个物体从四个训练视图中学习，识别率达到了91.1%。