Suppr超能文献

机器人中的模仿和镜像系统通过深度模态混合网络。

Imitation and mirror systems in robots through Deep Modality Blending Networks.

机构信息

Bogazici University, Bebek, Istanbul, 34342, Turkey.

Bogazici University, Bebek, Istanbul, 34342, Turkey.

出版信息

Neural Netw. 2022 Feb;146:22-35. doi: 10.1016/j.neunet.2021.11.004. Epub 2021 Nov 16.

Abstract

Learning to interact with the environment not only empowers the agent with manipulation capability but also generates information to facilitate building of action understanding and imitation capabilities. This seems to be a strategy adopted by biological systems, in particular primates, as evidenced by the existence of mirror neurons that seem to be involved in multi-modal action understanding. How to benefit from the interaction experience of the robots to enable understanding actions and goals of other agents is still a challenging question. In this study, we propose a novel method, deep modality blending networks (DMBN), that creates a common latent space from multi-modal experience of a robot by blending multi-modal signals with a stochastic weighting mechanism. We show for the first time that deep learning, when combined with a novel modality blending scheme, can facilitate action recognition and produce structures to sustain anatomical and effect-based imitation capabilities. Our proposed system, which is based on conditional neural processes, can be conditioned on any desired sensory/motor value at any time step, and can generate a complete multi-modal trajectory consistent with the desired conditioning in one-shot by querying the network for all the sampled time points in parallel avoiding the accumulation of prediction errors. Based on simulation experiments with an arm-gripper robot and an RGB camera, we showed that DMBN could make accurate predictions about any missing modality (camera or joint angles) given the available ones outperforming recent multimodal variational autoencoder models in terms of long-horizon high-dimensional trajectory predictions. We further showed that given desired images from different perspectives, i.e. images generated by the observation of other robots placed on different sides of the table, our system could generate image and joint angle sequences that correspond to either anatomical or effect-based imitation behavior. To achieve this mirror-like behavior, our system does not perform a pixel-based template matching but rather benefits from and relies on the common latent space constructed by using both joint and image modalities, as shown by additional experiments. Moreover, we showed that mirror learning (in our system) does not only depend on visual experience and cannot be achieved without proprioceptive experience. Our experiments showed that out of ten training scenarios with different initial configurations, the proposed DMBN model could achieve mirror learning in all of the cases where the model that only uses visual information failed in half of them. Overall, the proposed DMBN architecture not only serves as a computational model for sustaining mirror neuron-like capabilities, but also stands as a powerful machine learning architecture for high-dimensional multi-modal temporal data with robust retrieval capabilities operating with partial information in one or multiple modalities.

摘要

学习与环境交互不仅赋予了智能体操纵能力,还生成了信息,以促进对动作的理解和模仿能力的建立。这种策略似乎是生物系统(特别是灵长类动物)采用的策略,这一点可以从涉及多模态动作理解的镜像神经元的存在中得到证明。如何从机器人的交互经验中受益,从而使智能体能够理解其他智能体的动作和目标,仍然是一个具有挑战性的问题。在这项研究中,我们提出了一种新的方法,即深度模态混合网络(DMBN),该方法通过使用随机加权机制混合多模态信号,从机器人的多模态经验中创建一个通用的潜在空间。我们首次表明,深度学习与新的模态混合方案相结合,可以促进动作识别,并产生维持基于解剖和基于效果的模仿能力的结构。我们的系统基于条件神经过程,可以在任何时间步条件化任何所需的感觉/运动值,并通过并行查询网络来获得所有采样时间点,从而在一次查询中生成与所需条件化一致的完整多模态轨迹,避免了预测错误的积累。基于手臂夹持机器人和 RGB 相机的仿真实验,我们表明,DMBN 可以在给定可用模态的情况下,对任何缺失模态(相机或关节角度)进行准确预测,在长时距高维轨迹预测方面优于最近的多模态变分自动编码器模型。我们进一步表明,给定来自不同视角的期望图像,即通过观察放置在桌子不同侧面的其他机器人生成的图像,我们的系统可以生成与解剖或基于效果的模仿行为相对应的图像和关节角度序列。为了实现这种镜像行为,我们的系统不执行基于像素的模板匹配,而是受益于并依赖于使用关节和图像模态构建的通用潜在空间,这一点可以通过额外的实验证明。此外,我们表明,镜像学习(在我们的系统中)不仅取决于视觉经验,而且没有本体感觉经验是无法实现的。我们的实验表明,在十个具有不同初始配置的训练场景中,提出的 DMBN 模型可以在所有情况下实现镜像学习,而仅使用视觉信息的模型在其中一半情况下无法实现。总体而言,所提出的 DMBN 架构不仅是维持镜像神经元样能力的计算模型,而且是一种强大的机器学习架构,用于具有强大检索能力的高维多模态时间数据,可在一个或多个模态中使用部分信息进行操作。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验