• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机抽象策略:推广知识以改进强化学习。

Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.

出版信息

IEEE Trans Cybern. 2015 Jan;45(1):77-88. doi: 10.1109/TCYB.2014.2319733. Epub 2014 May 13.

DOI:10.1109/TCYB.2014.2319733
PMID:24835233
Abstract

Reinforcement learning (RL) enables an agent to learn behavior by acquiring experience through trial-and-error interactions with a dynamic environment. However, knowledge is usually built from scratch and learning to behave may take a long time. Here, we improve the learning performance by leveraging prior knowledge; that is, the learner shows proper behavior from the beginning of a target task, using the knowledge from a set of known, previously solved, source tasks. In this paper, we argue that building stochastic abstract policies that generalize over past experiences is an effective way to provide such improvement and this generalization outperforms the current practice of using a library of policies. We achieve that contributing with a new algorithm, AbsProb-PI-multiple and a framework for transferring knowledge represented as a stochastic abstract policy in new RL tasks. Stochastic abstract policies offer an effective way to encode knowledge because the abstraction they provide not only generalizes solutions but also facilitates extracting the similarities among tasks. We perform experiments in a robotic navigation environment and analyze the agent's behavior throughout the learning process and also assess the transfer ratio for different amounts of source tasks. We compare our method with the transfer of a library of policies, and experiments show that the use of a generalized policy produces better results by more effectively guiding the agent when learning a target task.

摘要

强化学习 (RL) 通过与动态环境的反复交互来获取经验,从而使代理能够学习行为。然而,知识通常是从头开始建立的,并且学习行为可能需要很长时间。在这里,我们通过利用先验知识来提高学习性能;也就是说,学习者从目标任务的一开始就使用来自一组已知的、先前解决的源任务的知识表现出适当的行为。在本文中,我们认为构建泛化过去经验的随机抽象策略是提供这种改进的有效方法,并且这种泛化优于当前使用策略库的做法。我们通过一个新的算法 AbsProb-PI-multiple 和一个用于在新的 RL 任务中转移表示为随机抽象策略的知识的框架来实现这一点。随机抽象策略提供了一种有效的编码知识的方法,因为它们提供的抽象不仅概括了解决方案,而且还便于提取任务之间的相似性。我们在机器人导航环境中进行实验,并在整个学习过程中分析代理的行为,还评估了不同数量源任务的转移比例。我们将我们的方法与策略库的转移进行了比较,实验表明,通过更有效地指导代理学习目标任务,使用广义策略会产生更好的结果。

相似文献

1
Stochastic abstract policies: generalizing knowledge to improve reinforcement learning.随机抽象策略:推广知识以改进强化学习。
IEEE Trans Cybern. 2015 Jan;45(1):77-88. doi: 10.1109/TCYB.2014.2319733. Epub 2014 May 13.
2
Kernel-based least squares policy iteration for reinforcement learning.用于强化学习的基于核的最小二乘策略迭代
IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.
3
Context transfer in reinforcement learning using action-value functions.基于动作值函数的强化学习中的上下文转移
Comput Intell Neurosci. 2014;2014:428567. doi: 10.1155/2014/428567. Epub 2014 Dec 31.
4
Autonomous reinforcement learning with experience replay.自主强化学习与经验回放。
Neural Netw. 2013 May;41:156-67. doi: 10.1016/j.neunet.2012.11.007. Epub 2012 Nov 29.
5
Reinforcement learning of motor skills with policy gradients.基于策略梯度的运动技能强化学习。
Neural Netw. 2008 May;21(4):682-97. doi: 10.1016/j.neunet.2008.02.003. Epub 2008 Apr 26.
6
Reinforcement learning algorithms for robotic navigation in dynamic environments.用于动态环境中机器人导航的强化学习算法。
ISA Trans. 2004 Apr;43(2):217-30. doi: 10.1016/s0019-0578(07)60032-9.
7
Modeling of autonomous problem solving process by dynamic construction of task models in multiple tasks environment.在多任务环境中通过动态构建任务模型对自主问题解决过程进行建模。
Neural Netw. 2006 Oct;19(8):1169-80. doi: 10.1016/j.neunet.2006.05.037. Epub 2006 Sep 20.
8
Reinforcement Learning for Improving Agent Design.强化学习在改进智能体设计中的应用。
Artif Life. 2019 Fall;25(4):352-365. doi: 10.1162/artl_a_00301. Epub 2019 Nov 7.
9
Impedance learning for robotic contact tasks using natural actor-critic algorithm.使用自然演员-评论家算法的机器人接触任务阻抗学习
IEEE Trans Syst Man Cybern B Cybern. 2010 Apr;40(2):433-43. doi: 10.1109/TSMCB.2009.2026289. Epub 2009 Aug 18.
10
Meta-learning in reinforcement learning.强化学习中的元学习。
Neural Netw. 2003 Jan;16(1):5-9. doi: 10.1016/s0893-6080(02)00228-9.

引用本文的文献

1
KnowRU: Knowledge Reuse via Knowledge Distillation in Multi-Agent Reinforcement Learning.KnowRU:多智能体强化学习中通过知识蒸馏实现的知识复用
Entropy (Basel). 2021 Aug 13;23(8):1043. doi: 10.3390/e23081043.