基于群体强化学习的 3D 多人游戏中的人类水平表现。

Human-level performance in 3D multiplayer games with population-based reinforcement learning.

机构信息

DeepMind, London, UK.

出版信息

Science. 2019 May 31;364(6443):859-865. doi: 10.1126/science.aau6249.

Abstract

Reinforcement learning (RL) has shown great success in increasingly complex single-agent environments and two-player turn-based games. However, the real world contains multiple agents, each learning and acting independently to cooperate and compete with other agents. We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, in Capture the Flag mode, using only pixels and game points scored as input. We used a two-tier optimization process in which a population of independent RL agents are trained concurrently from thousands of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research.

摘要

强化学习（RL）在日益复杂的单智能体环境和两人回合制游戏中取得了巨大成功。然而，现实世界中包含多个智能体，每个智能体都在独立学习和行动，以与其他智能体合作和竞争。我们使用锦标赛式的评估方法证明，智能体仅使用像素和游戏得分作为输入，在三维多人第一人称视频游戏的夺旗模式下，可以达到人类水平的表现。我们使用了两级优化过程，其中，一组独立的 RL 智能体从数千个随机生成的环境中的并行比赛中同时进行训练。每个智能体都学习自己的内部奖励信号和对世界的丰富表示。这些结果表明，多智能体强化学习在人工智能研究中有很大的潜力。

相似文献

Human-level performance in 3D multiplayer games with population-based reinforcement learning.

Science. 2019 May 31;364(6443):859-865. doi: 10.1126/science.aau6249.

Human-level control through deep reinforcement learning.

Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.

Mastering the game of Stratego with model-free multiagent reinforcement learning.

Science. 2022 Dec 2;378(6623):990-996. doi: 10.1126/science.add4679. Epub 2022 Dec 1.

Grandmaster level in StarCraft II using multi-agent reinforcement learning.

Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.

Inductive biases in theory-based reinforcement learning.

Cogn Psychol. 2022 Nov;138:101509. doi: 10.1016/j.cogpsych.2022.101509. Epub 2022 Sep 21.

MOSAIC for multiple-reward environments.

Neural Comput. 2012 Mar;24(3):577-606. doi: 10.1162/NECO_a_00246. Epub 2011 Dec 14.

Human locomotion with reinforcement learning using bioinspired reward reshaping strategies.

Med Biol Eng Comput. 2021 Jan;59(1):243-256. doi: 10.1007/s11517-020-02309-3. Epub 2021 Jan 8.

Emergent Solutions to High-Dimensional Multitask Reinforcement Learning.

Evol Comput. 2018 Fall;26(3):347-380. doi: 10.1162/evco_a_00232. Epub 2018 Jun 22.

Application of Deep Reinforcement Learning to NS-SHAFT Game Signal Control.

Sensors (Basel). 2022 Jul 14;22(14):5265. doi: 10.3390/s22145265.

Outracing champion Gran Turismo drivers with deep reinforcement learning.

Nature. 2022 Feb;602(7896):223-228. doi: 10.1038/s41586-021-04357-7. Epub 2022 Feb 9.

引用本文的文献

Dynamic Network Plasticity and Sample Efficiency in Biological Neural Cultures: A Comparative Study with Deep Reinforcement Learning.

Cyborg Bionic Syst. 2025 Aug 4;6:0336. doi: 10.34133/cbsystems.0336. eCollection 2025.

Influence-aware memory architectures for deep reinforcement learning in POMDPs.

Neural Comput Appl. 2025;37(19):13145-13161. doi: 10.1007/s00521-022-07691-7. Epub 2022 Sep 4.

Deep mechanism design: Learning social and economic policies for human benefit.

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2319949121. doi: 10.1073/pnas.2319949121. Epub 2025 Jun 16.

MetaSeeker: sketching an open invisible space with self-play reinforcement learning.

Light Sci Appl. 2025 Jun 4;14(1):211. doi: 10.1038/s41377-025-01876-0.

Future scientific paradigms in the integration of materials, aerospace and information.

Natl Sci Rev. 2025 Apr 10;12(6):nwaf122. doi: 10.1093/nsr/nwaf122. eCollection 2025 Jun.

Global progress in competitive co-evolution: a systematic comparison of alternative methods.

Front Robot AI. 2025 Jan 21;11:1470886. doi: 10.3389/frobt.2024.1470886. eCollection 2024.

Collaborative hunting in artificial agents with deep reinforcement learning.

Elife. 2024 May 7;13:e85694. doi: 10.7554/eLife.85694.

Transformer Decoder-Based Enhanced Exploration Method to Alleviate Initial Exploration Problems in Reinforcement Learning.

Sensors (Basel). 2023 Aug 25;23(17):7411. doi: 10.3390/s23177411.

SC2EGSet: StarCraft II Esport Replay and Game-state Dataset.

Sci Data. 2023 Sep 8;10(1):600. doi: 10.1038/s41597-023-02510-7.

Learning to play against any mixture of opponents.

Front Artif Intell. 2023 Jul 20;6:804682. doi: 10.3389/frai.2023.804682. eCollection 2023.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于群体强化学习的 3D 多人游戏中的人类水平表现。

Human-level performance in 3D multiplayer games with population-based reinforcement learning.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献