Pazem Joséphine, Krumm Marius, Vining Alexander Q, Fiderer Lukas J, Briegel Hans J
Institut für Theoretische Physik, Universität Innsbruck, Innsbruck, Austria.
Department for the Ecology of Animal Societies, Max Planck Institute of Animal Behavior, Konstanz, Germany.
PLoS One. 2025 Sep 4;20(9):e0331047. doi: 10.1371/journal.pone.0331047. eCollection 2025.
In the last decade, the free energy principle (FEP) and active inference (AIF) have achieved many successes connecting conceptual models of learning and cognition to mathematical models of perception and action. This effort is driven by a multidisciplinary interest in understanding aspects of self-organizing complex adaptive systems, including elements of agency. Various reinforcement learning (RL) models performing active inference have been proposed and trained on standard RL tasks using deep neural networks. Recent work has focused on improving such agents' performance in complex environments by incorporating the latest machine learning techniques. In this paper, we build upon these techniques. Within the constraints imposed by the FEP and AIF, we attempt to model agents in an interpretable way without deep neural networks by introducing Free Energy Projective Simulation (FEPS). Using internal rewards only, FEPS agents build a representation of their partially observable environments with which they interact. Following AIF, the policy to achieve a given task is derived from this world model by minimizing the expected free energy. Leveraging the interpretability of the model, techniques are introduced to deal with long-term goals and reduce prediction errors caused by erroneous hidden state estimation. We test the FEPS model on two RL environments inspired from behavioral biology: a timed response task and a navigation task in a partially observable grid. Our results show that FEPS agents fully resolve the ambiguity of both environments by appropriately contextualizing their observations based on prediction accuracy only. In addition, they infer optimal policies flexibly for any target observation in the environment.
在过去十年中,自由能原理(FEP)和主动推理(AIF)在将学习与认知的概念模型与感知和行动的数学模型相联系方面取得了诸多成功。这项工作是由对理解自组织复杂自适应系统(包括智能体元素)各方面的多学科兴趣驱动的。已经提出了各种执行主动推理的强化学习(RL)模型,并使用深度神经网络在标准RL任务上进行了训练。最近的工作集中在通过纳入最新的机器学习技术来提高此类智能体在复杂环境中的性能。在本文中,我们基于这些技术展开研究。在FEP和AIF所施加的约束范围内,我们尝试通过引入自由能投影模拟(FEPS),以一种无需深度神经网络的可解释方式对智能体进行建模。仅使用内部奖励,FEPS智能体构建其部分可观测环境的表示,并与之进行交互。遵循AIF,通过最小化预期自由能,从这个世界模型中推导出实现给定任务的策略。利用该模型的可解释性,引入了处理长期目标并减少由错误的隐藏状态估计引起的预测误差的技术。我们在受行为生物学启发的两个RL环境中测试了FEPS模型:一个定时响应任务和一个部分可观测网格中的导航任务。我们的结果表明,FEPS智能体仅通过基于预测准确性对其观测进行适当的情境化,就完全解决了两个环境的模糊性。此外,它们能灵活地为环境中的任何目标观测推断出最优策略。