• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于价值函数的强化学习算法的统一分析。

A unified analysis of value-function-based reinforcement- learning algorithms.

作者信息

Szepesvári C, Littman M L

机构信息

Mindmaker, Ltd., Budapest 1121, Konkoly Thege M. U. 29-33, Hungary.

出版信息

Neural Comput. 1999 Nov 15;11(8):2017-59. doi: 10.1162/089976699300016070.

DOI:10.1162/089976699300016070
PMID:10578043
Abstract

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

摘要

强化学习是在给定与顺序决策环境交互机会的情况下,生成最优行为的问题。许多解决强化学习问题的算法通过计算最优值函数的改进估计来工作。我们扩展了对强化学习算法的先前分析,并提出了一个强大的新定理,该定理可以对这类基于值函数的强化学习算法进行统一分析。该定理的有用之处在于,它允许通过验证一个更简单的同步算法收敛来证明一个复杂的异步强化学习算法的收敛性。我们通过分析Q学习、基于模型的强化学习、多状态更新的Q学习、马尔可夫博弈的Q学习以及风险敏感强化学习的收敛性来说明该定理的应用。

相似文献

1
A unified analysis of value-function-based reinforcement- learning algorithms.基于价值函数的强化学习算法的统一分析。
Neural Comput. 1999 Nov 15;11(8):2017-59. doi: 10.1162/089976699300016070.
2
Optimization of anemia treatment in hemodialysis patients via reinforcement learning.通过强化学习优化血液透析患者的贫血治疗。
Artif Intell Med. 2014 Sep;62(1):47-60. doi: 10.1016/j.artmed.2014.07.004. Epub 2014 Jul 19.
3
Autonomous reinforcement learning with experience replay.自主强化学习与经验回放。
Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.
4
Kernel-based least squares policy iteration for reinforcement learning.用于强化学习的基于核的最小二乘策略迭代
IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.
5
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
6
Decentralized learning in Markov games.马尔可夫博弈中的分布式学习
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):976-81. doi: 10.1109/TSMCB.2008.920998.
7
Meta-learning in reinforcement learning.强化学习中的元学习。
Neural Netw. 2003 Jan;16(1):5-9. doi: 10.1016/s0893-6080(02)00228-9.
8
Benchmarking for Bayesian Reinforcement Learning.贝叶斯强化学习的基准测试
PLoS One. 2016 Jun 15;11(6):e0157088. doi: 10.1371/journal.pone.0157088. eCollection 2016.
9
Heuristically-accelerated multiagent reinforcement learning.启发式加速多智能体强化学习。
IEEE Trans Cybern. 2014 Feb;44(2):252-65. doi: 10.1109/TCYB.2013.2253094.
10
Online learning of shaping rewards in reinforcement learning.强化学习中的塑造奖励在线学习。
Neural Netw. 2010 May;23(4):541-50. doi: 10.1016/j.neunet.2010.01.001. Epub 2010 Jan 11.

引用本文的文献

1
PAC Reinforcement Learning Algorithm for General-Sum Markov Games.用于一般和马尔可夫博弈的PAC强化学习算法
IEEE Trans Automat Contr. 2023 May;68(5):2821-2831. doi: 10.1109/tac.2022.3219340. Epub 2022 Nov 3.
2
Decentralized Policy Coordination in Mobile Sensing with Consensual Communication.基于共识通信的移动感知中的分散式政策协调。
Sensors (Basel). 2022 Dec 7;22(24):9584. doi: 10.3390/s22249584.
3
DRL-RNP: Deep Reinforcement Learning-Based Optimized RNP Flight Procedure Execution.基于深度强化学习的优化 RNP 飞行程序执行。
Sensors (Basel). 2022 Aug 28;22(17):6475. doi: 10.3390/s22176475.
4
A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm.基于强化学习的具有奖励塑造算法的多维目标飞机制导方法。
Sensors (Basel). 2021 Aug 21;21(16):5643. doi: 10.3390/s21165643.
5
A Cross-Layer Routing Protocol Based on Quasi-Cooperative Multi-Agent Learning for Multi-Hop Cognitive Radio Networks.基于准协同多智能体学习的多跳认知无线电网络跨层路由协议。
Sensors (Basel). 2019 Jan 3;19(1):151. doi: 10.3390/s19010151.