van de Laar Thijs, Koudahl Magnus, van Erp Bart, de Vries Bert
Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands.
Nested Minds Network Ltd., Liverpool, United Kingdom.
Front Robot AI. 2022 Apr 6;9:794464. doi: 10.3389/frobt.2022.794464. eCollection 2022.
The Free Energy Principle (FEP) postulates that biological agents perceive and interact with their environment in order to minimize a Variational Free Energy (VFE) with respect to a generative model of their environment. The inference of a policy (future control sequence) according to the FEP is known as Active Inference (AIF). The AIF literature describes multiple VFE objectives for policy planning that lead to epistemic (information-seeking) behavior. However, most objectives have limited modeling flexibility. This paper approaches epistemic behavior from a constrained Bethe Free Energy (CBFE) perspective. Crucially, variational optimization of the CBFE can be expressed in terms of message passing on free-form generative models. The key intuition behind the CBFE is that we impose a point-mass constraint on predicted outcomes, which explicitly encodes the assumption that the agent will make observations in the future. We interpret the CBFE objective in terms of its constituent behavioral drives. We then illustrate resulting behavior of the CBFE by planning and interacting with a simulated T-maze environment. Simulations for the T-maze task illustrate how the CBFE agent exhibits an epistemic drive, and actively plans ahead to account for the impact of predicted outcomes. Compared to an EFE agent, the CBFE agent incurs expected reward in significantly more environmental scenarios. We conclude that CBFE optimization by message passing suggests a general mechanism for epistemic-aware AIF in free-form generative models.
自由能量原理(FEP)假定生物主体感知其环境并与之相互作用,以便相对于其环境的生成模型将变分自由能量(VFE)最小化。根据FEP对策略(未来控制序列)进行推理被称为主动推理(AIF)。AIF文献描述了多种用于策略规划的VFE目标,这些目标会导致认知(信息寻求)行为。然而,大多数目标的建模灵活性有限。本文从约束贝叶斯自由能量(CBFE)的角度探讨认知行为。至关重要的是,CBFE的变分优化可以用在自由形式生成模型上的消息传递来表示。CBFE背后的关键直觉是,我们对预测结果施加了一个点质量约束,该约束明确编码了主体未来将进行观察的假设。我们根据其组成行为驱动来解释CBFE目标。然后,我们通过在模拟的T型迷宫环境中进行规划和交互来说明CBFE产生的行为。T型迷宫任务的模拟说明了CBFE主体如何表现出认知驱动,并积极提前规划以考虑预测结果的影响。与基于预期自由能量(EFE)的主体相比,CBFE主体在更多环境场景中获得了预期奖励。我们得出结论,通过消息传递进行CBFE优化为自由形式生成模型中的认知感知AIF提供了一种通用机制。