Suppr超能文献

通过均衡转移加速多智能体强化学习。

Accelerating Multiagent Reinforcement Learning by Equilibrium Transfer.

出版信息

IEEE Trans Cybern. 2015 Jul;45(7):1289-302. doi: 10.1109/TCYB.2014.2349152. Epub 2014 Aug 29.

Abstract

An important approach in multiagent reinforcement learning (MARL) is equilibrium-based MARL, which adopts equilibrium solution concepts in game theory and requires agents to play equilibrium strategies at each state. However, most existing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations (e.g., computing Nash equilibria is PPAD-hard) during learning. For the first time, this paper finds that during the learning process of equilibrium-based MARL, the one-shot games corresponding to each state's successive visits often have the same or similar equilibria (for some states more than 90% of games corresponding to successive visits have similar equilibria). Inspired by this observation, this paper proposes to use equilibrium transfer to accelerate equilibrium-based MARL. The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate. By introducing transfer loss and transfer condition, a novel framework called equilibrium transfer-based MARL is proposed. We prove that although equilibrium transfer brings transfer loss, equilibrium-based MARL algorithms can still converge to an equilibrium policy under certain assumptions. Experimental results in widely used benchmarks (e.g., grid world game, soccer game, and wall game) show that the proposed framework: 1) not only significantly accelerates equilibrium-based MARL (up to 96.7% reduction in learning time), but also achieves higher average rewards than algorithms without equilibrium transfer and 2) scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.

摘要

多智能体强化学习(MARL)中的一个重要方法是基于均衡的 MARL,它在博弈论中采用均衡解概念,并要求智能体在每个状态下都采用均衡策略。然而,大多数现有的基于均衡的 MARL 算法由于在学习过程中需要进行大量计算成本高昂的均衡计算(例如,计算纳什均衡是 PPAD 难问题)而无法扩展。本文首次发现,在基于均衡的 MARL 的学习过程中,对应于每个状态的连续访问的单次博弈通常具有相同或相似的均衡(对于某些状态,超过 90%的对应连续访问的博弈具有相似的均衡)。受此观察结果的启发,本文提出使用均衡转移来加速基于均衡的 MARL。均衡转移的关键思想是在每个智能体有较小的偏离激励时重用先前计算的均衡。通过引入转移损失和转移条件,提出了一种称为基于均衡转移的 MARL 的新框架。我们证明,尽管均衡转移带来了转移损失,但基于均衡的 MARL 算法仍可以在某些假设下收敛到均衡策略。在广泛使用的基准(例如网格世界游戏、足球游戏和墙壁游戏)中的实验结果表明,所提出的框架:1)不仅显著加速了基于均衡的 MARL(学习时间减少了 96.7%),而且比没有均衡转移的算法实现了更高的平均奖励;2)在状态/动作空间增长和智能体数量增加时,比没有均衡转移的算法扩展性能更好。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验