文献检索，用中文搜 PubMed

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

Moerland Thomas M, Broekens Joost, Plaat Aske, Jonker Catholijn M

Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, Netherlands.

Interactive Intelligence, Delft University of Technology, Delft, Netherlands.

Front Artif Intell. 2022 Jul 11;5:908353. doi: 10.3389/frai.2022.908353. eCollection 2022.

Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are and , which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.

序列决策通常被形式化为马尔可夫决策过程的优化，是人工智能中的一个关键挑战。马尔可夫决策过程优化的两种成功方法是[方法一]和[方法二]，这两种方法在很大程度上都有各自的研究群体。然而，如果这两个研究领域解决的是同一个问题，那么我们或许能够梳理出它们解决方法中的共同因素。因此，本文提出了一种强化学习与规划的统一算法框架（FRAP），该框架确定了马尔可夫决策过程规划和学习算法必须做出决策的潜在维度。在本文结尾，我们沿着这些维度比较了各种著名的规划算法、无模型和基于模型的强化学习算法。总的来说，该框架可能有助于更深入地洞察规划和强化学习的算法设计空间。