Vrancx Peter, Verbeeck Katja, Nowé Ann
Computational Modeling Laboratory (COMO), Vrije Universiteit Brussel, 1050 Brussels, Belgium.
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):976-81. doi: 10.1109/TSMCB.2008.920998.
Learning automata (LA) were recently shown to be valuable tools for designing multiagent reinforcement learning algorithms. One of the principal contributions of the LA theory is that a set of decentralized independent LA is able to control a finite Markov chain with unknown transition probabilities and rewards. In this paper, we propose to extend this algorithm to Markov games--a straightforward extension of single-agent Markov decision problems to distributed multiagent decision problems. We show that under the same ergodic assumptions of the original theorem, the extended algorithm will converge to a pure equilibrium point between agent policies.
学习自动机(LA)最近被证明是设计多智能体强化学习算法的有价值工具。LA理论的主要贡献之一是,一组分散的独立学习自动机能够控制一个具有未知转移概率和奖励的有限马尔可夫链。在本文中,我们提议将该算法扩展到马尔可夫博弈——单智能体马尔可夫决策问题到分布式多智能体决策问题的直接扩展。我们表明,在原始定理的相同遍历假设下,扩展算法将收敛到智能体策略之间的纯均衡点。