• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

强化学习与规划的统一框架

A Unifying Framework for Reinforcement Learning and Planning.

作者信息

Moerland Thomas M, Broekens Joost, Plaat Aske, Jonker Catholijn M

机构信息

Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, Netherlands.

Interactive Intelligence, Delft University of Technology, Delft, Netherlands.

出版信息

Front Artif Intell. 2022 Jul 11;5:908353. doi: 10.3389/frai.2022.908353. eCollection 2022.

DOI:10.3389/frai.2022.908353
PMID:35898393
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9309375/
Abstract

Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are and , which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.

摘要

序列决策通常被形式化为马尔可夫决策过程的优化,是人工智能中的一个关键挑战。马尔可夫决策过程优化的两种成功方法是[方法一]和[方法二],这两种方法在很大程度上都有各自的研究群体。然而,如果这两个研究领域解决的是同一个问题,那么我们或许能够梳理出它们解决方法中的共同因素。因此,本文提出了一种强化学习与规划的统一算法框架(FRAP),该框架确定了马尔可夫决策过程规划和学习算法必须做出决策的潜在维度。在本文结尾,我们沿着这些维度比较了各种著名的规划算法、无模型和基于模型的强化学习算法。总的来说,该框架可能有助于更深入地洞察规划和强化学习的算法设计空间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/832609a9ea8b/frai-05-908353-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/8f894b1b5bf8/frai-05-908353-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/2dd2bb7428b1/frai-05-908353-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/21ac5344c8b1/frai-05-908353-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/131a42ee027b/frai-05-908353-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/dc35fdc595db/frai-05-908353-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/92765e470ad4/frai-05-908353-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/e8a4ee4c2d11/frai-05-908353-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/832609a9ea8b/frai-05-908353-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/8f894b1b5bf8/frai-05-908353-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/2dd2bb7428b1/frai-05-908353-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/21ac5344c8b1/frai-05-908353-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/131a42ee027b/frai-05-908353-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/dc35fdc595db/frai-05-908353-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/92765e470ad4/frai-05-908353-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/e8a4ee4c2d11/frai-05-908353-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9a5/9309375/832609a9ea8b/frai-05-908353-g0008.jpg

相似文献

1
A Unifying Framework for Reinforcement Learning and Planning.强化学习与规划的统一框架
Front Artif Intell. 2022 Jul 11;5:908353. doi: 10.3389/frai.2022.908353. eCollection 2022.
2
Delighting Palates with AI: Reinforcement Learning's Triumph in Crafting Personalized Meal Plans with High User Acceptance.用人工智能取悦味蕾:强化学习在制定高用户接受度的个性化膳食计划方面的成功。
Nutrients. 2024 Jan 24;16(3):346. doi: 10.3390/nu16030346.
3
Intelligent inverse treatment planning via deep reinforcement learning, a proof-of-principle study in high dose-rate brachytherapy for cervical cancer.通过深度强化学习实现智能反演治疗计划,宫颈癌高剂量率近距离放疗的原理验证研究。
Phys Med Biol. 2019 May 29;64(11):115013. doi: 10.1088/1361-6560/ab18bf.
4
MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning.MOO-MDP:面向协同多智能体强化学习的面向对象表示。
IEEE Trans Cybern. 2019 Feb;49(2):567-579. doi: 10.1109/TCYB.2017.2781130. Epub 2017 Dec 28.
5
Parameterized MDPs and Reinforcement Learning Problems-A Maximum Entropy Principle-Based Framework.参数化马尔可夫决策过程和强化学习问题——基于最大熵原理的框架。
IEEE Trans Cybern. 2022 Sep;52(9):9339-9351. doi: 10.1109/TCYB.2021.3102510. Epub 2022 Aug 18.
6
A review of reinforcement learning based hyper-heuristics.基于强化学习的超启发式方法综述。
PeerJ Comput Sci. 2024 Jun 28;10:e2141. doi: 10.7717/peerj-cs.2141. eCollection 2024.
7
Improvement of Reinforcement Learning With Supermodularity.基于超模性的强化学习改进
IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5298-5309. doi: 10.1109/TNNLS.2023.3244024. Epub 2023 Sep 1.
8
Optimizing Robotic Task Sequencing and Trajectory Planning on the Basis of Deep Reinforcement Learning.基于深度强化学习优化机器人任务排序与轨迹规划
Biomimetics (Basel). 2023 Dec 27;9(1):10. doi: 10.3390/biomimetics9010010.
9
Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization.用于自适应多路径路由优化的多智能体元强化学习
IEEE Trans Neural Netw Learn Syst. 2022 Oct;33(10):5374-5386. doi: 10.1109/TNNLS.2021.3070584. Epub 2022 Oct 5.
10
Human locomotion with reinforcement learning using bioinspired reward reshaping strategies.基于生物启发式奖励重塑策略的强化学习的人类运动。
Med Biol Eng Comput. 2021 Jan;59(1):243-256. doi: 10.1007/s11517-020-02309-3. Epub 2021 Jan 8.

引用本文的文献

1
Deep Hybrid Models: Infer and Plan in a Dynamic World.深度混合模型:在动态世界中进行推理与规划。
Entropy (Basel). 2025 May 27;27(6):570. doi: 10.3390/e27060570.
2
Data-Driven Robotic Manipulation of Cloth-like Deformable Objects: The Present, Challenges and Future Prospects.数据驱动的布料类可变形物体机器人操控:现状、挑战与未来展望。
Sensors (Basel). 2023 Feb 21;23(5):2389. doi: 10.3390/s23052389.
3
Pathfinding in stochastic environments: learning planning.
PeerJ Comput Sci. 2022 Aug 18;8:e1056. doi: 10.7717/peerj-cs.1056. eCollection 2022.

本文引用的文献

1
First return, then explore.先返回,再探索。
Nature. 2021 Feb;590(7847):580-586. doi: 10.1038/s41586-020-03157-9. Epub 2021 Feb 24.
2
Teacher-Student Curriculum Learning.师生课程学习
IEEE Trans Neural Netw Learn Syst. 2020 Sep;31(9):3732-3740. doi: 10.1109/TNNLS.2019.2934906. Epub 2019 Sep 9.
3
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.一种通过自我对弈掌握国际象棋、将棋和围棋的通用强化学习算法。
Science. 2018 Dec 7;362(6419):1140-1144. doi: 10.1126/science.aar6404.
4
Mastering the game of Go without human knowledge.无需人类知识即可掌握围棋游戏。
Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.
5
Hybrid computing using a neural network with dynamic external memory.使用具有动态外部存储器的神经网络进行混合计算。
Nature. 2016 Oct 27;538(7626):471-476. doi: 10.1038/nature20101. Epub 2016 Oct 12.
6
Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework.人类和动物中的强化学习与情景记忆:一个综合框架
Annu Rev Psychol. 2017 Jan 3;68:101-128. doi: 10.1146/annurev-psych-122414-033625. Epub 2016 Sep 2.
7
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
8
Planning as inference.规划即推理。
Trends Cogn Sci. 2012 Oct;16(10):485-8. doi: 10.1016/j.tics.2012.08.006. Epub 2012 Aug 30.
9
Dynamic programming.动态规划。
Science. 1966 Jul 1;153(3731):34-7. doi: 10.1126/science.153.3731.34.