计划学习：一种基于模型规划的主动学习新算法。

Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning.

作者信息

Hodson Rowan, Bassett Bruce, van Hoof Charel, Rosman Benjamin, Solms Mark, Shock Jonathan P, Smith Ryan

机构信息

Laureate Institute for Brain Research. Tulsa, OK, USA.

University of Cape Town, South Africa.

出版信息

ArXiv. 2023 Aug 15:arXiv:2308.08029v1.

PMID:37645053

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10462173/

Abstract

Active Inference is a recently developed framework for modeling decision processes under uncertainty. Over the last several years, empirical and theoretical work has begun to evaluate the strengths and weaknesses of this approach and how it might be extended and improved. One recent extension is the "sophisticated inference" (SI) algorithm, which improves performance on multi-step planning problems through a recursive decision tree search. However, little work to date has been done to compare SI to other established planning algorithms in reinforcement learning (RL). In addition, SI was developed with a focus on inference as opposed to learning. The present paper therefore has two aims. First, we compare performance of SI to Bayesian RL schemes designed to solve similar problems. Second, we present and compare an extension of SI - sophisticated learning (SL) - that more fully incorporates active learning during planning. SL maintains beliefs about how model parameters would change under the future observations expected under each policy. This allows a form of counterfactual retrospective inference in which the agent considers what could be learned from current or past observations given different future observations. To accomplish these aims, we make use of a novel, biologically inspired environment that requires an optimal balance between goal-seeking and active learning, and which was designed to highlight the problem structure for which SL offers a unique solution. This setup requires an agent to continually search an open environment for available (but changing) resources in the presence of competing affordances for information gain. Our simulations demonstrate that SL outperforms all other algorithms in this context - most notably, Bayes-adaptive RL and upper confidence bound (UCB) algorithms, which aim to solve multi-step planning problems using similar principles (i.e., directed exploration and counterfactual reasoning about belief updates given different possible actions/observations). These results provide added support for the utility of Active Inference in solving this class of biologically-relevant problems and offer added tools for testing hypotheses about human cognition.

摘要

主动推理是最近开发的一种用于在不确定性下对决策过程进行建模的框架。在过去几年中，实证和理论工作已开始评估这种方法的优缺点以及如何对其进行扩展和改进。最近的一个扩展是“复杂推理”（SI）算法，它通过递归决策树搜索提高了多步规划问题的性能。然而，迄今为止，几乎没有开展将SI与强化学习（RL）中其他既定的规划算法进行比较的工作。此外，SI的开发侧重于推理而非学习。因此，本文有两个目标。首先，我们将SI的性能与旨在解决类似问题的贝叶斯RL方案进行比较。其次，我们提出并比较SI的一个扩展——复杂学习（SL）——它在规划过程中更全面地纳入了主动学习。SL维持关于在每种策略下预期的未来观察下模型参数将如何变化的信念。这允许进行一种反事实的回顾性推理，其中智能体考虑在给定不同未来观察的情况下可以从当前或过去的观察中学到什么。为了实现这些目标，我们利用了一种新颖的、受生物学启发的环境，该环境需要在目标寻求和主动学习之间实现最佳平衡，并且旨在突出SL提供独特解决方案的问题结构。这种设置要求智能体在存在信息获取的竞争可供性的情况下，在开放环境中持续搜索可用（但不断变化）的资源。我们的模拟表明，在这种情况下，SL优于所有其他算法——最显著的是贝叶斯自适应RL和上置信界（UCB）算法，它们旨在使用类似的原则（即定向探索和关于给定不同可能行动/观察的信念更新的反事实推理）来解决多步规划问题。这些结果为主动推理在解决这类与生物学相关问题中的效用提供了更多支持，并为检验关于人类认知的假设提供了更多工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0828/10462173/7b38832aad40/nihpp-2308.08029v1-f0006.jpg

相似文献

Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning.

ArXiv. 2023 Aug 15:arXiv:2308.08029v1.

Sophisticated Inference.

Neural Comput. 2021 Mar;33(3):713-763. doi: 10.1162/neco_a_01351.

An empirical evaluation of active inference in multi-armed bandits.

Neural Netw. 2021 Dec;144:229-246. doi: 10.1016/j.neunet.2021.08.018. Epub 2021 Aug 26.

Active Inference and Reinforcement Learning: A Unified Inference on Continuous State and Action Spaces Under Partial Observability.

Neural Comput. 2024 Sep 17;36(10):2073-2135. doi: 10.1162/neco_a_01698.

Branching time active inference: Empirical study and complexity class analysis.

Neural Netw. 2022 Aug;152:450-466. doi: 10.1016/j.neunet.2022.05.010. Epub 2022 May 20.

Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving.

J R Soc Interface. 2015 Mar 6;12(104):20141335. doi: 10.1098/rsif.2014.1335.

A generative spiking neural-network model of goal-directed behaviour and one-step planning.

PLoS Comput Biol. 2020 Dec 8;16(12):e1007579. doi: 10.1371/journal.pcbi.1007579. eCollection 2020 Dec.

Active inference and the two-step task.

Sci Rep. 2022 Oct 21;12(1):17682. doi: 10.1038/s41598-022-21766-4.

Reinforcement learning for decision-making under deep uncertainty.

J Environ Manage. 2024 May;359:120968. doi: 10.1016/j.jenvman.2024.120968. Epub 2024 May 3.

Branching Time Active Inference: The theory and its generality.

Neural Netw. 2022 Jul;151:295-316. doi: 10.1016/j.neunet.2022.03.036. Epub 2022 Apr 6.

本文引用的文献

Reward Maximization Through Discrete Active Inference.

Neural Comput. 2023 Apr 18;35(5):807-852. doi: 10.1162/neco_a_01574.

A step-by-step tutorial on active inference and its application to empirical data.

J Math Psychol. 2022 Apr;107. doi: 10.1016/j.jmp.2021.102632. Epub 2022 Feb 4.

Sophisticated Inference.

Neural Comput. 2021 Mar;33(3):713-763. doi: 10.1162/neco_a_01351.

Active Inference: Demystified and Compared.

Neural Comput. 2021 Mar;33(3):674-712. doi: 10.1162/neco_a_01357. Epub 2021 Jan 5.

Generalised free energy and active inference.

Biol Cybern. 2019 Dec;113(5-6):495-513. doi: 10.1007/s00422-019-00805-w. Epub 2019 Sep 27.

Computational mechanisms of curiosity and goal-directed exploration.

Elife. 2019 May 10;8:e41703. doi: 10.7554/eLife.41703.

State-of-the-art in artificial neural network applications: A survey.

Heliyon. 2018 Nov 23;4(11):e00938. doi: 10.1016/j.heliyon.2018.e00938. eCollection 2018 Nov.

Free energy, value, and attractors.

Comput Math Methods Med. 2012;2012:937860. doi: 10.1155/2012/937860. Epub 2011 Dec 21.

Theoretical Analysis of Heuristic Search Methods for Online POMDPs.

Adv Neural Inf Process Syst. 2008;20:1216-1225.

Action understanding and active inference.

Biol Cybern. 2011 Feb;104(1-2):137-60. doi: 10.1007/s00422-011-0424-z. Epub 2011 Feb 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

计划学习：一种基于模型规划的主动学习新算法。

Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献