Ferraro Stefano, Van de Maele Toon, Verbelen Tim, Dhoedt Bart
IDLab, Department of Information Technology, Ghent University-imec, Ghent, Belgium.
Interface Focus. 2023 Apr 14;13(3):20220077. doi: 10.1098/rsfs.2022.0077. eCollection 2023 Jun 6.
Humans perceive and interact with hundreds of objects every day. In doing so, they need to employ mental models of these objects and often exploit symmetries in the object's shape and appearance in order to learn generalizable and transferable skills. Active inference is a first principles approach to understanding and modelling sentient agents. It states that agents entertain a generative model of their environment, and learn and act by minimizing an upper bound on their surprisal, i.e. their free energy. The free energy decomposes into an accuracy and complexity term, meaning that agents favour the least complex model that can accurately explain their sensory observations. In this paper, we investigate how inherent symmetries of particular objects also emerge as symmetries in the latent state space of the generative model learnt under deep active inference. In particular, we focus on object-centric representations, which are trained from pixels to predict novel object views as the agent moves its viewpoint. First, we investigate the relation between model complexity and symmetry exploitation in the state space. Second, we do a principal component analysis to demonstrate how the model encodes the principal axis of symmetry of the object in the latent space. Finally, we also demonstrate how more symmetrical representations can be exploited for better generalization in the context of manipulation.
人类每天会感知数百个物体并与之交互。在此过程中,他们需要运用这些物体的心理模型,并且常常利用物体形状和外观中的对称性来学习可推广和可迁移的技能。主动推理是一种理解和建模有感知能力的智能体的第一性原理方法。它指出,智能体持有其环境的生成模型,并通过最小化其惊奇度(即其自由能)的上限来学习和行动。自由能分解为一个准确性项和一个复杂性项,这意味着智能体倾向于选择能够准确解释其感官观察的最简单模型。在本文中,我们研究了特定物体的固有对称性如何也作为在深度主动推理下学习的生成模型的潜在状态空间中的对称性而出现。特别是,我们关注以物体为中心的表示,这些表示是从像素进行训练的,以便在智能体移动其视点时预测新颖的物体视图。首先,我们研究状态空间中模型复杂性与对称性利用之间的关系。其次,我们进行主成分分析以展示模型如何在潜在空间中编码物体的对称轴。最后,我们还展示了在操纵的背景下,如何利用更对称的表示来实现更好的泛化。