通过离散主动推理实现奖励最大化。

Reward Maximization Through Discrete Active Inference.

作者信息

Da Costa Lancelot, Sajid Noor, Parr Thomas, Friston Karl, Smith Ryan

机构信息

Department of Mathematics, Imperial College London, London SW7 2AZ, U.K.

Wellcome Centre for Human Neuroimaging, University College London, London, WC1N 3AR, U.K.

出版信息

Neural Comput. 2023 Apr 18;35(5):807-852. doi: 10.1162/neco_a_01574.

DOI:10.1162/neco_a_01574

PMID:36944240

Abstract

Active inference is a probabilistic framework for modeling the behavior of biological and artificial agents, which derives from the principle of minimizing free energy. In recent years, this framework has been applied successfully to a variety of situations where the goal was to maximize reward, often offering comparable and sometimes superior performance to alternative approaches. In this article, we clarify the connection between reward maximization and active inference by demonstrating how and when active inference agents execute actions that are optimal for maximizing reward. Precisely, we show the conditions under which active inference produces the optimal solution to the Bellman equation, a formulation that underlies several approaches to model-based reinforcement learning and control. On partially observed Markov decision processes, the standard active inference scheme can produce Bellman optimal actions for planning horizons of 1 but not beyond. In contrast, a recently developed recursive active inference scheme (sophisticated inference) can produce Bellman optimal actions on any finite temporal horizon. We append the analysis with a discussion of the broader relationship between active inference and reinforcement learning.

摘要

主动推理是一种用于对生物和人工主体行为进行建模的概率框架，它源于使自由能量最小化的原则。近年来，该框架已成功应用于各种旨在最大化奖励的情境中，其表现往往与其他方法相当，有时甚至更优。在本文中，我们通过展示主动推理主体如何以及何时执行对最大化奖励而言最优的行动，阐明了奖励最大化与主动推理之间的联系。具体而言，我们展示了主动推理产生贝尔曼方程最优解的条件，贝尔曼方程是基于模型的强化学习和控制的几种方法的基础。在部分可观测马尔可夫决策过程中，标准的主动推理方案可以为规划时域为1的情况产生贝尔曼最优行动，但无法用于更长的时域。相比之下，最近开发的递归主动推理方案（复杂推理）可以在任何有限的时间范围内产生贝尔曼最优行动。我们在分析之后讨论了主动推理与强化学习之间更广泛的关系。