马尔可夫博弈中的分布式学习

Decentralized learning in Markov games.

作者信息

Vrancx Peter, Verbeeck Katja, Nowé Ann

机构信息

Computational Modeling Laboratory (COMO), Vrije Universiteit Brussel, 1050 Brussels, Belgium.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):976-81. doi: 10.1109/TSMCB.2008.920998.

DOI:10.1109/TSMCB.2008.920998

PMID:18632387

Abstract

Learning automata (LA) were recently shown to be valuable tools for designing multiagent reinforcement learning algorithms. One of the principal contributions of the LA theory is that a set of decentralized independent LA is able to control a finite Markov chain with unknown transition probabilities and rewards. In this paper, we propose to extend this algorithm to Markov games--a straightforward extension of single-agent Markov decision problems to distributed multiagent decision problems. We show that under the same ergodic assumptions of the original theorem, the extended algorithm will converge to a pure equilibrium point between agent policies.

摘要

学习自动机（LA）最近被证明是设计多智能体强化学习算法的有价值工具。LA理论的主要贡献之一是，一组分散的独立学习自动机能够控制一个具有未知转移概率和奖励的有限马尔可夫链。在本文中，我们提议将该算法扩展到马尔可夫博弈——单智能体马尔可夫决策问题到分布式多智能体决策问题的直接扩展。我们表明，在原始定理的相同遍历假设下，扩展算法将收敛到智能体策略之间的纯均衡点。

相似文献

Decentralized learning in Markov games.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):976-81. doi: 10.1109/TSMCB.2008.920998.

Decentralized indirect methods for learning automata games.

IEEE Trans Syst Man Cybern B Cybern. 2011 Oct;41(5):1213-23. doi: 10.1109/TSMCB.2011.2118749.

Model-based reinforcement learning for partially observable games with sampling-based state estimation.

Neural Comput. 2007 Nov;19(11):3051-87. doi: 10.1162/neco.2007.19.11.3051.

Partially observable Markov decision processes and performance sensitivity analysis.

IEEE Trans Syst Man Cybern B Cybern. 2008 Dec;38(6):1645-51. doi: 10.1109/TSMCB.2008.927711.

Unsupervised learning of Probabilistic Grammar-Markov Models for object categories.

IEEE Trans Pattern Anal Mach Intell. 2009 Jan;31(1):114-28. doi: 10.1109/TPAMI.2008.67.

Decentralized Bayesian search using approximate dynamic programming methods.

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):970-5. doi: 10.1109/TSMCB.2008.928180.

Prioritizing point-based POMDP solvers.

IEEE Trans Syst Man Cybern B Cybern. 2008 Dec;38(6):1592-605. doi: 10.1109/TSMCB.2008.928222.

Colonies of learning automata.

IEEE Trans Syst Man Cybern B Cybern. 2002;32(6):772-80. doi: 10.1109/TSMCB.2002.1049611.

Markov chains: computing limit existence and approximations with DNA.

Biosystems. 2005 Sep;81(3):261-6. doi: 10.1016/j.biosystems.2005.05.003.

Approximate learning algorithm in Boltzmann machines.

Neural Comput. 2009 Nov;21(11):3130-78. doi: 10.1162/neco.2009.08-08-844.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

马尔可夫博弈中的分布式学习

Decentralized learning in Markov games.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献