具身物体表征学习与识别

Embodied Object Representation Learning and Recognition.

作者信息

Van de Maele Toon, Verbelen Tim, Çatal Ozan, Dhoedt Bart

机构信息

IDLab, Department of Information Technology, Ghent University - imec, Ghent, Belgium.

出版信息

Front Neurorobot. 2022 Apr 14;16:840658. doi: 10.3389/fnbot.2022.840658. eCollection 2022.

DOI:10.3389/fnbot.2022.840658

PMID:35496899

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9049856/

Abstract

Scene understanding and decomposition is a crucial challenge for intelligent systems, whether it is for object manipulation, navigation, or any other task. Although current machine and deep learning approaches for object detection and classification obtain high accuracy, they typically do not leverage interaction with the world and are limited to a set of objects seen during training. Humans on the other hand learn to recognize and classify different objects by actively engaging with them on first encounter. Moreover, recent theories in neuroscience suggest that cortical columns in the neocortex play an important role in this process, by building predictive models about objects in their reference frame. In this article, we present an enactive embodied agent that implements such a generative model for object interaction. For each object category, our system instantiates a deep neural network, called Cortical Column Network (CCN), that represents the object in its own reference frame by learning a generative model that predicts the expected transform in pixel space, given an action. The model parameters are optimized through the active inference paradigm, i.e., the minimization of variational free energy. When provided with a visual observation, an ensemble of CCNs each vote on their belief of observing that specific object category, yielding a potential object classification. In case the likelihood on the selected category is too low, the object is detected as an unknown category, and the agent has the ability to instantiate a novel CCN for this category. We validate our system in an simulated environment, where it needs to learn to discern multiple objects from the YCB dataset. We show that classification accuracy improves as an embodied agent can gather more evidence, and that it is able to learn about novel, previously unseen objects. Finally, we show that an agent driven through active inference can choose their actions to reach a preferred observation.

摘要

场景理解与分解是智能系统面临的一项关键挑战，无论是对于物体操纵、导航还是任何其他任务而言。尽管当前用于物体检测和分类的机器学习及深度学习方法取得了很高的准确率，但它们通常没有利用与世界的交互，并且局限于训练期间见过的一组物体。另一方面，人类通过首次接触时与物体积极互动来学习识别和分类不同的物体。此外，神经科学的最新理论表明，新皮层中的皮质柱在这一过程中发挥着重要作用，通过在其参考框架内构建关于物体的预测模型。在本文中，我们提出了一种具身能动智能体，它实现了这样一种用于物体交互的生成模型。对于每个物体类别，我们的系统实例化一个深度神经网络，称为皮质柱网络（CCN），该网络通过学习一个生成模型来在其自身参考框架内表示物体，该生成模型在给定一个动作的情况下预测像素空间中的预期变换。模型参数通过主动推理范式进行优化，即变分自由能的最小化。当提供视觉观察时，一组CCN各自就其观察到该特定物体类别的信念进行投票，从而产生一个潜在的物体分类。如果所选类别的似然性过低，则将该物体检测为未知类别，并且智能体有能力为该类别实例化一个新的CCN。我们在一个模拟环境中验证了我们的系统，在该环境中它需要学习从YCB数据集中辨别多个物体。我们表明，随着具身智能体能够收集更多证据，分类准确率会提高，并且它能够学习关于新的、以前未见过的物体。最后，我们表明通过主动推理驱动的智能体可以选择其动作以达到一个偏好的观察结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f40/9049856/822828963c8c/fnbot-16-840658-g0001.jpg

相似文献

Embodied Object Representation Learning and Recognition.

Front Neurorobot. 2022 Apr 14;16:840658. doi: 10.3389/fnbot.2022.840658. eCollection 2022.

Learning Generative State Space Models for Active Inference.

Front Comput Neurosci. 2020 Nov 16;14:574372. doi: 10.3389/fncom.2020.574372. eCollection 2020.

Object-Centric Scene Representations Using Active Inference.

Neural Comput. 2024 Mar 21;36(4):677-704. doi: 10.1162/neco_a_01637.

Representation in natural and artificial agents: an embodied cognitive science perspective.

Z Naturforsch C J Biosci. 1998 Jul-Aug;53(7-8):480-503. doi: 10.1515/znc-1998-7-804.

Active Vision for Robot Manipulators Using the Free Energy Principle.

Front Neurorobot. 2021 Mar 5;15:642780. doi: 10.3389/fnbot.2021.642780. eCollection 2021.

View-invariant object category learning, recognition, and search: how spatial and object attention are coordinated using surface-based attentional shrouds.

Cogn Psychol. 2009 Feb;58(1):1-48. doi: 10.1016/j.cogpsych.2008.05.001. Epub 2008 Jul 23.

A Theory of How Columns in the Neocortex Enable Learning the Structure of the World.

Front Neural Circuits. 2017 Oct 25;11:81. doi: 10.3389/fncir.2017.00081. eCollection 2017.

How does the brain rapidly learn and reorganize view-invariant and position-invariant object representations in the inferotemporal cortex?

Neural Netw. 2011 Dec;24(10):1050-61. doi: 10.1016/j.neunet.2011.04.004. Epub 2011 Apr 22.

Symmetry and complexity in object-centric deep active inference models.

Interface Focus. 2023 Apr 14;13(3):20220077. doi: 10.1098/rsfs.2022.0077. eCollection 2023 Jun 6.

Learning efficient haptic shape exploration with a rigid tactile sensor array.

PLoS One. 2020 Jan 2;15(1):e0226880. doi: 10.1371/journal.pone.0226880. eCollection 2020.

引用本文的文献

Deep Hybrid Models: Infer and Plan in a Dynamic World.

Entropy (Basel). 2025 May 27;27(6):570. doi: 10.3390/e27060570.

Symmetry and complexity in object-centric deep active inference models.

Interface Focus. 2023 Apr 14;13(3):20220077. doi: 10.1098/rsfs.2022.0077. eCollection 2023 Jun 6.

Viewpoint planning with transition management for active object recognition.

Front Neurorobot. 2023 Feb 24;17:1093132. doi: 10.3389/fnbot.2023.1093132. eCollection 2023.

本文引用的文献

How to Represent Part-Whole Hierarchies in a Neural Network.

Neural Comput. 2023 Feb 17;35(3):413-452. doi: 10.1162/neco_a_01557.

Generalized Simultaneous Localization and Mapping (G-SLAM) as unification framework for natural and artificial intelligences: towards reverse engineering the hippocampal/entorhinal system and principles of high-level cognition.

Front Syst Neurosci. 2022 Sep 30;16:787659. doi: 10.3389/fnsys.2022.787659. eCollection 2022.

A step-by-step tutorial on active inference and its application to empirical data.

J Math Psychol. 2022 Apr;107. doi: 10.1016/j.jmp.2021.102632. Epub 2022 Feb 4.

The Radically Embodied Conscious Cybernetic Bayesian Brain: From Free Energy to Free Will and Back Again.

Entropy (Basel). 2021 Jun 20;23(6):783. doi: 10.3390/e23060783.

Generative Models for Active Vision.

Front Neurorobot. 2021 Apr 13;15:651432. doi: 10.3389/fnbot.2021.651432. eCollection 2021.

Active Vision for Robot Manipulators Using the Free Energy Principle.

Front Neurorobot. 2021 Mar 5;15:642780. doi: 10.3389/fnbot.2021.642780. eCollection 2021.

Deep Active Inference and Scene Construction.

Front Artif Intell. 2020 Oct 28;3:509354. doi: 10.3389/frai.2020.509354. eCollection 2020.

Learning Generative State Space Models for Active Inference.

Front Comput Neurosci. 2020 Nov 16;14:574372. doi: 10.3389/fncom.2020.574372. eCollection 2020.

Bayesian Filtering with Multiple Internal Models: Toward a Theory of Social Intelligence.

Neural Comput. 2019 Dec;31(12):2390-2431. doi: 10.1162/neco_a_01239. Epub 2019 Oct 15.

Computational mechanisms of curiosity and goal-directed exploration.

Elife. 2019 May 10;8:e41703. doi: 10.7554/eLife.41703.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

具身物体表征学习与识别

Embodied Object Representation Learning and Recognition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献