• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

强化学习中的塑造奖励在线学习。

Online learning of shaping rewards in reinforcement learning.

机构信息

Department of Computer Science, University of York, York YO105DD, UK.

出版信息

Neural Netw. 2010 May;23(4):541-50. doi: 10.1016/j.neunet.2010.01.001. Epub 2010 Jan 11.

DOI:10.1016/j.neunet.2010.01.001
PMID:20116208
Abstract

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. In this paper, we show how, in the absence of knowledge to define the potential function manually, this function can be learned online in parallel with the actual reinforcement learning process. Two cases are considered. The first solution which is based on the multi-grid discretisation is designed for model-free reinforcement learning. In the second case, the approach for the prototypical model-based R-max algorithm is proposed. It learns the potential function using the free space assumption about the transitions in the environment. Two novel algorithms are presented and evaluated.

摘要

基于势的奖励塑造已被证明是一种提高强化学习代理收敛速度的有效方法。它是一种灵活的技术,可以以一种有原则的方式将背景知识纳入时间差分学习中。然而,问题仍然是如何计算用于塑造奖励的势函数,该奖励被给予学习代理。在本文中,我们展示了在没有知识手动定义势函数的情况下,如何在线学习该函数,与实际的强化学习过程并行。考虑了两种情况。第一种基于多网格离散化的解决方案是为无模型强化学习设计的。在第二种情况下,提出了基于原型的 R-max 算法的方法。它使用关于环境中转换的自由空间假设来学习势函数。提出并评估了两种新算法。

相似文献

1
Online learning of shaping rewards in reinforcement learning.强化学习中的塑造奖励在线学习。
Neural Netw. 2010 May;23(4):541-50. doi: 10.1016/j.neunet.2010.01.001. Epub 2010 Jan 11.
2
Optimal control in microgrid using multi-agent reinforcement learning.微电网中的多智能体强化学习最优控制。
ISA Trans. 2012 Nov;51(6):743-51. doi: 10.1016/j.isatra.2012.06.010. Epub 2012 Jul 21.
3
Reinforcement learning in supply chains.供应链中的强化学习。
Int J Neural Syst. 2009 Oct;19(5):331-44. doi: 10.1142/S0129065709002063.
4
Posterior weighted reinforcement learning with state uncertainty.具有状态不确定性的后加权强化学习。
Neural Comput. 2010 May;22(5):1149-79. doi: 10.1162/neco.2010.01-09-948.
5
Efficient model learning methods for actor-critic control.用于演员-评论家控制的高效模型学习方法。
IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):591-602. doi: 10.1109/TSMCB.2011.2170565. Epub 2011 Dec 7.
6
Decentralized learning in Markov games.马尔可夫博弈中的分布式学习
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):976-81. doi: 10.1109/TSMCB.2008.920998.
7
Kernel-based least squares policy iteration for reinforcement learning.用于强化学习的基于核的最小二乘策略迭代
IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.
8
Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.啮齿动物在并发强化程序下基于模型的强化学习
Learn Mem. 2009 Apr 29;16(5):315-23. doi: 10.1101/lm.1295509. Print 2009 May.
9
Artificial intelligence framework for simulating clinical decision-making: a Markov decision process approach.人工智能框架模拟临床决策:马尔可夫决策过程方法。
Artif Intell Med. 2013 Jan;57(1):9-19. doi: 10.1016/j.artmed.2012.12.003. Epub 2012 Dec 31.
10
Parameter-exploring policy gradients.参数探索策略梯度。
Neural Netw. 2010 May;23(4):551-9. doi: 10.1016/j.neunet.2009.12.004. Epub 2009 Dec 16.

引用本文的文献

1
Reinforcement-Learning-Based Robust Resource Management for Multi-Radio Systems.基于强化学习的多射频系统鲁棒资源管理。
Sensors (Basel). 2023 May 17;23(10):4821. doi: 10.3390/s23104821.
2
Route searching based on neural networks and heuristic reinforcement learning.基于神经网络和启发式强化学习的路径搜索
Cogn Neurodyn. 2017 Jun;11(3):245-258. doi: 10.1007/s11571-017-9423-7. Epub 2017 Feb 9.