Suppr超能文献

强化学习中基于主动学习的价值函数逼近的有效探索。

Efficient exploration through active learning for value function approximation in reinforcement learning.

机构信息

Department of Computer Science, Tokyo Institute of Technology, 2-12-1 O-okayama, Meguro-ku, Tokyo 152-8552, Japan.

出版信息

Neural Netw. 2010 Jun;23(5):639-48. doi: 10.1016/j.neunet.2009.12.010. Epub 2010 Jan 11.

Abstract

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The effectiveness of the proposed method, which we call active policy iteration (API), is demonstrated through simulations with a batting robot.

摘要

恰当的采样策略设计对于强化学习中获得更好的控制策略非常重要。在本文中,我们首先表明最小二乘策略迭代(LSPI)框架允许我们将统计主动学习方法应用于线性回归。然后,我们提出了一种有效的探索采样策略设计方法,当即时奖励的采样成本较高时,该方法特别有用。我们称之为主动策略迭代(API)的方法的有效性通过击球机器人的仿真得到了验证。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验