学会与任何对手组合进行对抗。

Learning to play against any mixture of opponents.

作者信息

Smith Max Olan, Anthony Thomas, Wellman Michael P

机构信息

Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, United States.

DeepMind, London, United Kingdom.

出版信息

Front Artif Intell. 2023 Jul 20;6:804682. doi: 10.3389/frai.2023.804682. eCollection 2023.

DOI:10.3389/frai.2023.804682

PMID:37547229

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10400709/

Abstract

Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. If the mixture changes, ideally we would not have to train from scratch, but rather could transfer what we have learned to construct a policy to play against the new mixture. We propose a transfer learning method, , that starts by learning -values against each pure-strategy opponent. Then a -value for distribution of opponent strategies is approximated by appropriately averaging the separately learned -values. From these components, we construct policies against all opponent mixtures without any further training. We empirically validate Q-Mixing in two environments: a simple grid-world soccer environment, and a social dilemma game. Our experiments find that Q-Mixing can successfully transfer knowledge across any mixture of opponents. Next, we consider the use of observations during play to update the believed distribution of opponents. We introduce an opponent policy classifier-trained reusing Q-learning data-and use the classifier results to refine the mixing of -values. Q-Mixing augmented with the opponent policy classifier performs better, with higher variance, than training directly against a mixed-strategy opponent.

摘要

直观地说，在给定领域中与一种对手混合策略进行对抗的经验应该与同一领域中的另一种混合策略相关。如果混合策略发生变化，理想情况下我们不必从头开始训练，而是可以转移我们所学的知识来构建一个策略以对抗新的混合策略。我们提出了一种迁移学习方法，即Q-Mixing，它首先针对每个纯策略对手学习Q值。然后，通过对单独学习的Q值进行适当平均，来近似对手策略分布的Q值。基于这些组件，我们无需任何进一步训练即可构建针对所有对手混合策略的策略。我们在两种环境中对Q-Mixing进行了实证验证：一个简单的网格世界足球环境和一个社会困境游戏。我们的实验发现，Q-Mixing可以成功地在任何对手混合策略之间转移知识。接下来，我们考虑在游戏过程中使用观察结果来更新对手的置信分布。我们引入了一个对手策略分类器——利用Q学习数据进行训练——并使用分类器结果来优化Q值的混合。与直接针对混合策略对手进行训练相比，使用对手策略分类器增强后的Q-Mixing表现更好，但方差更高。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

学会与任何对手组合进行对抗。

Learning to play against any mixture of opponents.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

学会与任何对手组合进行对抗。

Learning to play against any mixture of opponents.

作者信息

机构信息

出版信息

相似文献

本文引用的文献