Graduate School of Information Science and Technology, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo, 113-8656, Japan; International Research Center for Neurointelligence (WPI-IRCN), Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan.
International Research Center for Neurointelligence (WPI-IRCN), Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan.
Neural Netw. 2024 Sep;177:106379. doi: 10.1016/j.neunet.2024.106379. Epub 2024 May 8.
Homeostasis is a self-regulatory process, wherein an organism maintains a specific internal physiological state. Homeostatic reinforcement learning (RL) is a framework recently proposed in computational neuroscience to explain animal behavior. Homeostatic RL organizes the behaviors of autonomous embodied agents according to the demands of the internal dynamics of their bodies, coupled with the external environment. Thus, it provides a basis for real-world autonomous agents, such as robots, to continually acquire and learn integrated behaviors for survival. However, prior studies have generally explored problems pertaining to limited size, as the agent must handle observations of such coupled dynamics. To overcome this restriction, we developed an advanced method to realize scaled-up homeostatic RL using deep RL. Furthermore, several rewards for homeostasis have been proposed in the literature. We identified that the reward definition that uses the difference in drive function yields the best results. We created two benchmark environments for homeostasis and performed a behavioral analysis. The analysis showed that the trained agents in each environment changed their behavior based on their internal physiological states. Finally, we extended our method to address vision using deep convolutional neural networks. The analysis of a trained agent revealed that it has visual saliency rooted in the survival environment and internal representations resulting from multimodal input.
体内平衡是一种自我调节过程,生物体通过这种过程维持特定的内部生理状态。最近在计算神经科学中提出的一种体内平衡强化学习(RL)框架用于解释动物行为。体内平衡 RL 根据身体内部动力学的要求,结合外部环境,组织自主式具身代理的行为。因此,它为现实世界中的自主式代理(如机器人)提供了一个基础,使它们能够不断地获取和学习生存所需的综合行为。然而,之前的研究通常探索了有限大小的问题,因为代理必须处理这些耦合动力学的观察。为了克服这个限制,我们开发了一种使用深度 RL 实现规模化体内平衡 RL 的先进方法。此外,文献中还提出了几种用于体内平衡的奖励。我们发现,使用驱动力函数差值的奖励定义会产生最好的结果。我们创建了两个用于体内平衡的基准环境,并进行了行为分析。分析表明,每个环境中训练的代理根据其内部生理状态改变了行为。最后,我们将我们的方法扩展到使用深度卷积神经网络解决视觉问题。对一个训练有素的代理的分析表明,它具有源于生存环境和多模态输入的内部表示的视觉显著性。