Papadimitriou Georgios I, Sklira Maria, Pomportsis Andreas S
Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece.
IEEE Trans Syst Man Cybern B Cybern. 2004 Feb;34(1):246-54. doi: 10.1109/tsmcb.2003.811117.
A new class of P-model absorbing learning automata is introduced. The proposed automata are based on the use of a stochastic estimator in order to achieve a rapid and accurate convergence when operating in stationary random environments. According to the proposed stochastic estimator scheme, the estimates of the reward probabilities of actions are not strictly dependent on the environmental responses. The dependence between the stochastic estimates and the deterministic ones is more relaxed for actions that have been selected only a few times. In this way, actions that have been selected only a few times, have the opportunity to be estimated as "optimal," to increase their choice probability and consequently, to be selected. In this way, the estimates become more reliable and consequently, the automaton rapidly and accurately converges to the optimal action. The asymptotic behavior of the proposed scheme is analyzed and it is proved to be epsilon-optimal in every stationary random environment. Furthermore, extensive simulation results are presented that indicate that the proposed stochastic estimator scheme converges faster than the deterministic-estimator-based DP(RI) and DGPA schemes when operating in stationary P-model random environments.
引入了一类新的P模型吸收学习自动机。所提出的自动机基于使用随机估计器,以便在平稳随机环境中运行时实现快速且准确的收敛。根据所提出的随机估计器方案,动作奖励概率的估计并不严格依赖于环境响应。对于仅被选择过几次的动作,随机估计与确定性估计之间的依赖性更为宽松。通过这种方式,仅被选择过几次的动作有机会被估计为“最优”,以增加其选择概率,从而被选中。这样,估计变得更加可靠,因此自动机能够快速且准确地收敛到最优动作。分析了所提出方案的渐近行为,并证明其在每个平稳随机环境中都是ε最优的。此外,给出了广泛的仿真结果,表明所提出的随机估计器方案在平稳P模型随机环境中运行时比基于确定性估计器的DP(RI)和DGPA方案收敛得更快。