Ren Xudie, Li Shenghong, Ge Hao
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China.
Shanghai Data Miracle Intelligent Technology Co., Ltd., Shanghai, China.
Front Neurorobot. 2022 Sep 23;16:999658. doi: 10.3389/fnbot.2022.999658. eCollection 2022.
For a learning automaton, a proper configuration of the learning parameters is crucial. To ensure stable and reliable performance in stochastic environments, manual parameter tuning is necessary for existing LA schemes, but the tuning procedure is time-consuming and interaction-costing. It is a fatal limitation for LA-based applications, especially for those environments where the interactions are expensive. In this paper, we propose a parameter-free learning automaton (PFLA) scheme to avoid parameter tuning by a Bayesian inference method. In contrast to existing schemes where the parameters must be carefully tuned according to the environment, PFLA works well with a set of consistent parameters in various environments. This intriguing property dramatically reduces the difficulty of applying a learning automaton to an unknown stochastic environment. A rigorous proof of ϵ-optimality for the proposed scheme is provided and numeric experiments are carried out on benchmark environments to verify its effectiveness. The results show that, without any parameter tuning cost, the proposed PFLA can achieve a competitive performance compared with other well-tuned schemes and outperform untuned schemes on the consistency of performance.
对于学习自动机而言,学习参数的恰当配置至关重要。为确保在随机环境中实现稳定可靠的性能,现有学习自动机方案需要进行手动参数调整,但该调整过程既耗时又耗费交互成本。这对于基于学习自动机的应用来说是一个致命限制,尤其是对于那些交互成本高昂的环境。在本文中,我们提出一种无参数学习自动机(PFLA)方案,通过贝叶斯推理方法避免参数调整。与现有方案不同,现有方案中参数必须根据环境仔细调整,而PFLA在各种环境中使用一组一致的参数就能良好运行。这一有趣的特性极大地降低了将学习自动机应用于未知随机环境的难度。我们为所提方案提供了严格的ε最优性证明,并在基准环境上进行了数值实验以验证其有效性。结果表明,所提PFLA无需任何参数调整成本,与其他经过良好调整的方案相比能够实现具有竞争力的性能,并且在性能一致性方面优于未调整的方案。