Suppr超能文献

强化学习与规划的统一框架

A Unifying Framework for Reinforcement Learning and Planning.

作者信息

Moerland Thomas M, Broekens Joost, Plaat Aske, Jonker Catholijn M

机构信息

Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, Netherlands.

Interactive Intelligence, Delft University of Technology, Delft, Netherlands.

出版信息

Front Artif Intell. 2022 Jul 11;5:908353. doi: 10.3389/frai.2022.908353. eCollection 2022.

Abstract

Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are and , which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.

摘要

序列决策通常被形式化为马尔可夫决策过程的优化,是人工智能中的一个关键挑战。马尔可夫决策过程优化的两种成功方法是[方法一]和[方法二],这两种方法在很大程度上都有各自的研究群体。然而,如果这两个研究领域解决的是同一个问题,那么我们或许能够梳理出它们解决方法中的共同因素。因此,本文提出了一种强化学习与规划的统一算法框架(FRAP),该框架确定了马尔可夫决策过程规划和学习算法必须做出决策的潜在维度。在本文结尾,我们沿着这些维度比较了各种著名的规划算法、无模型和基于模型的强化学习算法。总的来说,该框架可能有助于更深入地洞察规划和强化学习的算法设计空间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/8f894b1b5bf8/frai-05-908353-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验