Sharafeldin Abdelrahman, Imam Nabil, Choi Hannah
ML@GT, Georgia Institute of Technology, Atlanta, GA 30332, USA.
School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA.
Patterns (N Y). 2024 May 3;5(6):100983. doi: 10.1016/j.patter.2024.100983. eCollection 2024 Jun 14.
We present an end-to-end architecture for embodied exploration inspired by two biological computations: predictive coding and uncertainty minimization. The architecture can be applied to any exploration setting in a task-independent and intrinsically driven manner. We first demonstrate our approach in a maze navigation task and show that it can discover the underlying transition distributions and spatial features of the environment. Second, we apply our model to a more complex active vision task, whereby an agent actively samples its visual environment to gather information. We show that our model builds unsupervised representations through exploration that allow it to efficiently categorize visual scenes. We further show that using these representations for downstream classification leads to superior data efficiency and learning speed compared to other baselines while maintaining lower parameter complexity. Finally, the modular structure of our model facilitates interpretability, allowing us to probe its internal mechanisms and representations during exploration.
我们提出了一种受两种生物计算启发的端到端架构,用于具身探索:预测编码和不确定性最小化。该架构可以以任务独立和内在驱动的方式应用于任何探索场景。我们首先在迷宫导航任务中展示我们的方法,并表明它可以发现环境的潜在转移分布和空间特征。其次,我们将我们的模型应用于更复杂的主动视觉任务,即智能体主动对其视觉环境进行采样以收集信息。我们表明,我们的模型通过探索构建无监督表示,使其能够有效地对视觉场景进行分类。我们进一步表明,与其他基线相比,使用这些表示进行下游分类可带来更高的数据效率和学习速度,同时保持更低的参数复杂度。最后,我们模型的模块化结构便于解释,使我们能够在探索过程中探究其内部机制和表示。