Agache M, Oommen B J
Sch. of Comput. Sci., Carleton Univ., Ottawa, Ont., Canada.
IEEE Trans Syst Man Cybern B Cybern. 2002;32(6):738-49. doi: 10.1109/TSMCB.2002.1049608.
The fastest learning automata (LA) algorithms currently available fall in the family of estimator algorithms introduced by Thathachar and Sastry (1986). The pioneering work of these authors was the pursuit algorithm, which pursues only the current estimated optimal action. If this action is not the one with the minimum penalty probability, this algorithm pursues a wrong action. In this paper, we argue that a pursuit scheme that generalizes the traditional pursuit algorithm by pursuing all the actions with higher reward estimates than the chosen action, minimizes the probability of pursuing a wrong action, and is a faster converging scheme. To attest this, we present two new generalized pursuit algorithms (GPAs) and also present a quantitative comparison of their performance against the existing pursuit algorithms. Empirically, the algorithms proposed here are among the fastest reported LA to date.
目前可用的最快学习自动机(LA)算法属于Thathachar和Sastry(1986)引入的估计器算法家族。这些作者的开创性工作是追踪算法,该算法仅追踪当前估计的最优动作。如果此动作不是具有最小惩罚概率的动作,则该算法追踪的是错误动作。在本文中,我们认为一种追踪方案通过追踪所有奖励估计高于所选动作的动作来推广传统追踪算法,可将追踪错误动作的概率降至最低,并且是一种收敛更快的方案。为了证明这一点,我们提出了两种新的广义追踪算法(GPA),并对它们与现有追踪算法的性能进行了定量比较。从经验上看,这里提出的算法是迄今为止报道的最快的LA算法之一。