强化学习中基于主动学习的价值函数逼近的有效探索。

Efficient exploration through active learning for value function approximation in reinforcement learning.

机构信息

Department of Computer Science, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan.

出版信息

Neural Netw. 2010 Jun;23(5):639-48. doi: 10.1016/j.neunet.2009.12.010. Epub 2010 Jan 11.

DOI:10.1016/j.neunet.2009.12.010

PMID:20080026

Abstract

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot.

摘要

恰当的采样策略设计对于强化学习中获得更好的控制策略非常重要。在本文中，我们首先表明最小二乘策略迭代（LSPI）框架允许我们将统计主动学习方法应用于线性回归。然后，我们提出了一种有效的探索采样策略设计方法，当即时奖励的采样成本较高时，该方法特别有用。我们称之为主动策略迭代（API）的方法的有效性通过击球机器人的仿真得到了验证。

相似文献

Efficient exploration through active learning for value function approximation in reinforcement learning.

Neural Netw. 2010 Jun;23(5):639-48. doi: 10.1016/j.neunet.2009.12.010. Epub 2010 Jan 11.

Reinforcement learning of motor skills with policy gradients.

Neural Netw. 2008 May;21(4):682-97. doi: 10.1016/j.neunet.2008.02.003. Epub 2008 Apr 26.

Kernel-based least squares policy iteration for reinforcement learning.

IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.

Efficient sample reuse in policy gradients with parameter-based exploration.

Neural Comput. 2013 Jun;25(6):1512-47. doi: 10.1162/NECO_a_00452. Epub 2013 Mar 21.

Adaptive importance sampling for value function approximation in off-policy reinforcement learning.

Neural Netw. 2009 Dec;22(10):1399-410. doi: 10.1016/j.neunet.2009.01.002. Epub 2009 Jan 23.

Reward-weighted regression with sample reuse for direct policy search in reinforcement learning.

Neural Comput. 2011 Nov;23(11):2798-832. doi: 10.1162/NECO_a_00199. Epub 2011 Aug 18.

Comparison of behavior-based and planning techniques on the small robot maze exploration problem.

Neural Netw. 2010 May;23(4):560-7. doi: 10.1016/j.neunet.2010.02.001. Epub 2010 Feb 10.

Finding intrinsic rewards by embodied evolution and constrained reinforcement learning.

Neural Netw. 2008 Dec;21(10):1447-55. doi: 10.1016/j.neunet.2008.09.013. Epub 2008 Oct 9.

Hierarchical approximate policy iteration with binary-tree state space decomposition.

IEEE Trans Neural Netw. 2011 Dec;22(12):1863-77. doi: 10.1109/TNN.2011.2168422. Epub 2011 Oct 10.

Impedance learning for robotic contact tasks using natural actor-critic algorithm.

IEEE Trans Syst Man Cybern B Cybern. 2010 Apr;40(2):433-43. doi: 10.1109/TSMCB.2009.2026289. Epub 2009 Aug 18.

引用本文的文献

Learning Inverse Statics Models Efficiently With Symmetry-Based Exploration.

Front Neurorobot. 2018 Oct 23;12:68. doi: 10.3389/fnbot.2018.00068. eCollection 2018.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

强化学习中基于主动学习的价值函数逼近的有效探索。

Efficient exploration through active learning for value function approximation in reinforcement learning.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献