Hodson Rowan, Bassett Bruce, van Hoof Charel, Rosman Benjamin, Solms Mark, Shock Jonathan P, Smith Ryan
Laureate Institute for Brain Research. Tulsa, OK, USA.
University of Cape Town, South Africa.
ArXiv. 2023 Aug 15:arXiv:2308.08029v1.
Active Inference is a recently developed framework for modeling decision processes under uncertainty. Over the last several years, empirical and theoretical work has begun to evaluate the strengths and weaknesses of this approach and how it might be extended and improved. One recent extension is the "sophisticated inference" (SI) algorithm, which improves performance on multi-step planning problems through a recursive decision tree search. However, little work to date has been done to compare SI to other established planning algorithms in reinforcement learning (RL). In addition, SI was developed with a focus on inference as opposed to learning. The present paper therefore has two aims. First, we compare performance of SI to Bayesian RL schemes designed to solve similar problems. Second, we present and compare an extension of SI - sophisticated learning (SL) - that more fully incorporates active learning during planning. SL maintains beliefs about how model parameters would change under the future observations expected under each policy. This allows a form of counterfactual retrospective inference in which the agent considers what could be learned from current or past observations given different future observations. To accomplish these aims, we make use of a novel, biologically inspired environment that requires an optimal balance between goal-seeking and active learning, and which was designed to highlight the problem structure for which SL offers a unique solution. This setup requires an agent to continually search an open environment for available (but changing) resources in the presence of competing affordances for information gain. Our simulations demonstrate that SL outperforms all other algorithms in this context - most notably, Bayes-adaptive RL and upper confidence bound (UCB) algorithms, which aim to solve multi-step planning problems using similar principles (i.e., directed exploration and counterfactual reasoning about belief updates given different possible actions/observations). These results provide added support for the utility of Active Inference in solving this class of biologically-relevant problems and offer added tools for testing hypotheses about human cognition.
主动推理是最近开发的一种用于在不确定性下对决策过程进行建模的框架。在过去几年中,实证和理论工作已开始评估这种方法的优缺点以及如何对其进行扩展和改进。最近的一个扩展是“复杂推理”(SI)算法,它通过递归决策树搜索提高了多步规划问题的性能。然而,迄今为止,几乎没有开展将SI与强化学习(RL)中其他既定的规划算法进行比较的工作。此外,SI的开发侧重于推理而非学习。因此,本文有两个目标。首先,我们将SI的性能与旨在解决类似问题的贝叶斯RL方案进行比较。其次,我们提出并比较SI的一个扩展——复杂学习(SL)——它在规划过程中更全面地纳入了主动学习。SL维持关于在每种策略下预期的未来观察下模型参数将如何变化的信念。这允许进行一种反事实的回顾性推理,其中智能体考虑在给定不同未来观察的情况下可以从当前或过去的观察中学到什么。为了实现这些目标,我们利用了一种新颖的、受生物学启发的环境,该环境需要在目标寻求和主动学习之间实现最佳平衡,并且旨在突出SL提供独特解决方案的问题结构。这种设置要求智能体在存在信息获取的竞争可供性的情况下,在开放环境中持续搜索可用(但不断变化)的资源。我们的模拟表明,在这种情况下,SL优于所有其他算法——最显著的是贝叶斯自适应RL和上置信界(UCB)算法,它们旨在使用类似的原则(即定向探索和关于给定不同可能行动/观察的信念更新的反事实推理)来解决多步规划问题。这些结果为主动推理在解决这类与生物学相关问题中的效用提供了更多支持,并为检验关于人类认知的假设提供了更多工具。