Ingel Anti, Makkeh Abdullah, Corcoll Oriol, Vicente Raul
Institute of Computer Science, University of Tartu, Narva mnt 18, 51009 Tartu, Estonia.
Göttingen Campus Institute for Dynamics of Biological Networks, University of Göttingen, 37075 Göttingen, Germany.
Entropy (Basel). 2022 Mar 13;24(3):401. doi: 10.3390/e24030401.
Intuitively, the level of autonomy of an agent is related to the degree to which the agent's goals and behaviour are decoupled from the immediate control by the environment. Here, we capitalise on a recent information-theoretic formulation of autonomy and introduce an algorithm for calculating autonomy in a limiting process of time step approaching infinity. We tackle the question of how the autonomy level of an agent changes during training. In particular, in this work, we use the partial information decomposition (PID) framework to monitor the levels of autonomy and environment internalisation of reinforcement-learning (RL) agents. We performed experiments on two environments: a grid world, in which the agent has to collect food, and a repeating-pattern environment, in which the agent has to learn to imitate a sequence of actions by memorising the sequence. PID also allows us to answer how much the agent relies on its internal memory (versus how much it relies on the observations) when transitioning to its next internal state. The experiments show that specific terms of PID strongly correlate with the obtained reward and with the agent's behaviour against perturbations in the observations.
直观地说,智能体的自主性水平与智能体的目标和行为与环境的直接控制解耦的程度有关。在此,我们利用最近关于自主性的信息论公式,引入一种算法,用于在时间步长趋近于无穷大的极限过程中计算自主性。我们解决了智能体的自主性水平在训练过程中如何变化的问题。特别是,在这项工作中,我们使用部分信息分解(PID)框架来监测强化学习(RL)智能体的自主性水平和环境内化程度。我们在两种环境中进行了实验:一种是网格世界,智能体必须在其中收集食物;另一种是重复模式环境,智能体必须通过记忆序列来学习模仿一系列动作。PID还使我们能够回答智能体在转换到下一个内部状态时,在多大程度上依赖其内部记忆(相对于它在多大程度上依赖观察结果)。实验表明,PID的特定项与获得的奖励以及智能体针对观察结果中的扰动的行为密切相关。