Parisi German I, Weber Cornelius, Wermter Stefan
Department of Informatics, Knowledge Technology Institute, University of Hamburg Hamburg, Germany.
Front Neurorobot. 2015 Jun 9;9:3. doi: 10.3389/fnbot.2015.00003. eCollection 2015.
The visual recognition of complex, articulated human movements is fundamental for a wide range of artificial systems oriented toward human-robot communication, action classification, and action-driven perception. These challenging tasks may generally involve the processing of a huge amount of visual information and learning-based mechanisms for generalizing a set of training actions and classifying new samples. To operate in natural environments, a crucial property is the efficient and robust recognition of actions, also under noisy conditions caused by, for instance, systematic sensor errors and temporarily occluded persons. Studies of the mammalian visual system and its outperforming ability to process biological motion information suggest separate neural pathways for the distinct processing of pose and motion features at multiple levels and the subsequent integration of these visual cues for action perception. We present a neurobiologically-motivated approach to achieve noise-tolerant action recognition in real time. Our model consists of self-organizing Growing When Required (GWR) networks that obtain progressively generalized representations of sensory inputs and learn inherent spatio-temporal dependencies. During the training, the GWR networks dynamically change their topological structure to better match the input space. We first extract pose and motion features from video sequences and then cluster actions in terms of prototypical pose-motion trajectories. Multi-cue trajectories from matching action frames are subsequently combined to provide action dynamics in the joint feature space. Reported experiments show that our approach outperforms previous results on a dataset of full-body actions captured with a depth sensor, and ranks among the best results for a public benchmark of domestic daily actions.
对于广泛的面向人机通信、动作分类和动作驱动感知的人工系统而言,对复杂、有关节的人体动作进行视觉识别至关重要。这些具有挑战性的任务通常可能涉及大量视觉信息的处理以及基于学习的机制,用于归纳一组训练动作并对新样本进行分类。为了在自然环境中运行,一个关键特性是在例如由系统传感器误差和人员临时遮挡等噪声条件下,也能高效且稳健地识别动作。对哺乳动物视觉系统及其处理生物运动信息的卓越能力的研究表明,存在单独的神经通路,用于在多个层面上对姿势和运动特征进行不同处理,并随后将这些视觉线索整合以进行动作感知。我们提出一种受神经生物学启发的方法,以实时实现耐噪声动作识别。我们的模型由自组织按需生长(GWR)网络组成,该网络获得感觉输入的逐步泛化表示并学习固有的时空依赖性。在训练期间,GWR网络动态改变其拓扑结构以更好地匹配输入空间。我们首先从视频序列中提取姿势和运动特征,然后根据典型的姿势 - 运动轨迹对动作进行聚类。随后将来自匹配动作帧的多线索轨迹进行组合,以在联合特征空间中提供动作动态。报告的实验表明,我们的方法在使用深度传感器捕获的全身动作数据集上优于先前的结果,并且在家庭日常动作的公共基准测试中名列前茅。