Suppr超能文献

强化学习中用于价值函数估计的增量状态聚合

Incremental state aggregation for value function estimation in reinforcement learning.

作者信息

Mori Takeshi, Ishii Shin

机构信息

Institute of Perception, Action and Behaviour, School of Informatics, The University of Edinburgh, Edinburgh, UK.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2011 Oct;41(5):1407-16. doi: 10.1109/TSMCB.2011.2148710. Epub 2011 May 31.

Abstract

In reinforcement learning, large state and action spaces make the estimation of value functions impractical, so a value function is often represented as a linear combination of basis functions whose linear coefficients constitute parameters to be estimated. However, preparing basis functions requires a certain amount of prior knowledge and is, in general, a difficult task. To overcome this difficulty, an adaptive basis function construction technique has been proposed by Keller recently, but it requires excessive computational cost. We propose an efficient approach to this difficulty, in which the problem of approximating the value function is decomposed into a number of subproblems, each of which can be solved with small computational cost. Computer experiments show that the CPU time needed by our method is much smaller than that by the existing method.

摘要

在强化学习中,大的状态空间和动作空间使得价值函数的估计不切实际,因此价值函数通常表示为基函数的线性组合,其线性系数构成待估计的参数。然而,准备基函数需要一定的先验知识,并且通常是一项困难的任务。为了克服这一困难,凯勒最近提出了一种自适应基函数构造技术,但它需要过高的计算成本。我们针对这一困难提出了一种有效的方法,即将价值函数逼近问题分解为若干个子问题,每个子问题都可以用较小的计算成本来解决。计算机实验表明,我们的方法所需的CPU时间比现有方法少得多。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验