Suppr超能文献

快速且 ε-最优离散化追踪学习自动机。

Fast and Epsilon-Optimal Discretized Pursuit Learning Automata.

出版信息

IEEE Trans Cybern. 2015 Oct;45(10):2089-99. doi: 10.1109/TCYB.2014.2365463. Epub 2014 Nov 13.

Abstract

Learning automata (LA) are powerful tools for reinforcement learning. A discretized pursuit LA is the most popular one among them. During an iteration its operation consists of three basic phases: 1) selecting the next action; 2) finding the optimal estimated action; and 3) updating the state probability. However, when the number of actions is large, the learning becomes extremely slow because there are too many updates to be made at each iteration. The increased updates are mostly from phases 1 and 3. A new fast discretized pursuit LA with assured ε -optimality is proposed to perform both phases 1 and 3 with the computational complexity independent of the number of actions. Apart from its low computational complexity, it achieves faster convergence speed than the classical one when operating in stationary environments. This paper can promote the applications of LA toward the large-scale-action oriented area that requires efficient reinforcement learning tools with assured ε -optimality, fast convergence speed, and low computational complexity for each iteration.

摘要

学习自动机(LA)是强化学习的有力工具。在它们中,离散追踪 LA 是最受欢迎的一种。在一次迭代中,它的操作包括三个基本阶段:1)选择下一个动作;2)找到最优估计动作;3)更新状态概率。然而,当动作数量很大时,学习会变得非常缓慢,因为在每次迭代中都有太多的更新要做。增加的更新主要来自第 1 阶段和第 3 阶段。本文提出了一种新的快速离散追踪 LA,具有保证的 ε 最优性,可以以与动作数量无关的计算复杂度来执行第 1 阶段和第 3 阶段。除了计算复杂度低之外,它在静态环境下的收敛速度也比经典方法快。本文可以促进 LA 在面向大规模动作的领域的应用,这些领域需要具有保证 ε 最优性、快速收敛速度和低计算复杂度的高效强化学习工具,每个迭代的计算复杂度都要低。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验