• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人工智能机器人足球的两阶段训练算法。

Two-stage training algorithm for AI robot soccer.

作者信息

Kim Taeyoung, Vecchietti Luiz Felipe, Choi Kyujin, Sariel Sanem, Har Dongsoo

机构信息

Cho Chun Shik Graduate School of Green Transportation, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.

Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey.

出版信息

PeerJ Comput Sci. 2021 Sep 17;7:e718. doi: 10.7717/peerj-cs.718. eCollection 2021.

DOI:10.7717/peerj-cs.718
PMID:34616894
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8459783/
Abstract

In multi-agent reinforcement learning, the cooperative learning behavior of agents is very important. In the field of heterogeneous multi-agent reinforcement learning, cooperative behavior among different types of agents in a group is pursued. Learning a joint-action set during centralized training is an attractive way to obtain such cooperative behavior; however, this method brings limited learning performance with heterogeneous agents. To improve the learning performance of heterogeneous agents during centralized training, two-stage heterogeneous centralized training which allows the training of multiple roles of heterogeneous agents is proposed. During training, two training processes are conducted in a series. One of the two stages is to attempt training each agent according to its role, aiming at the maximization of individual role rewards. The other is for training the agents as a whole to make them learn cooperative behaviors while attempting to maximize shared collective rewards, , team rewards. Because these two training processes are conducted in a series in every time step, agents can learn how to maximize role rewards and team rewards simultaneously. The proposed method is applied to 5 versus 5 AI robot soccer for validation. The experiments are performed in a robot soccer environment using Webots robot simulation software. Simulation results show that the proposed method can train the robots of the robot soccer team effectively, achieving higher role rewards and higher team rewards as compared to other three approaches that can be used to solve problems of training cooperative multi-agent. Quantitatively, a team trained by the proposed method improves the score concede rate by 5% to 30% when compared to teams trained with the other approaches in matches against evaluation teams.

摘要

在多智能体强化学习中,智能体的合作学习行为非常重要。在异构多智能体强化学习领域,追求一组中不同类型智能体之间的合作行为。在集中训练期间学习联合动作集是获得这种合作行为的一种有吸引力的方法;然而,这种方法在异构智能体情况下带来的学习性能有限。为了提高集中训练期间异构智能体的学习性能,提出了两阶段异构集中训练方法,该方法允许对异构智能体的多个角色进行训练。在训练过程中,进行两个连续的训练过程。两个阶段中的一个阶段是根据每个智能体的角色尝试对其进行训练,并使个体角色奖励最大化。另一个阶段是将智能体作为一个整体进行训练,使它们学习合作行为,同时尝试使共享的集体奖励(即团队奖励)最大化。由于这两个训练过程在每个时间步都是连续进行的,因此智能体可以学习如何同时最大化角色奖励和团队奖励。所提出的方法应用于5对5的人工智能机器人足球比赛进行验证。实验在使用Webots机器人仿真软件的机器人足球环境中进行。仿真结果表明,与可用于解决合作多智能体训练问题的其他三种方法相比,所提出的方法可以有效地训练机器人足球队,获得更高的角色奖励和更高的团队奖励。从数量上看,与使用其他方法训练的团队在与评估团队的比赛中相比,使用所提出的方法训练的团队将失球率提高了5%至30%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/898d03f8bb51/peerj-cs-07-718-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/3ab26fe49c17/peerj-cs-07-718-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/fb8301155592/peerj-cs-07-718-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/1c1dd91fc2ec/peerj-cs-07-718-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/f2d3b8e3b72e/peerj-cs-07-718-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/fc38f6af4c1b/peerj-cs-07-718-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/5e96dd77262c/peerj-cs-07-718-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/a01115b0a244/peerj-cs-07-718-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/600f04ee9888/peerj-cs-07-718-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/0c08a5142c0c/peerj-cs-07-718-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/165629549708/peerj-cs-07-718-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/898d03f8bb51/peerj-cs-07-718-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/3ab26fe49c17/peerj-cs-07-718-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/fb8301155592/peerj-cs-07-718-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/1c1dd91fc2ec/peerj-cs-07-718-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/f2d3b8e3b72e/peerj-cs-07-718-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/fc38f6af4c1b/peerj-cs-07-718-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/5e96dd77262c/peerj-cs-07-718-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/a01115b0a244/peerj-cs-07-718-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/600f04ee9888/peerj-cs-07-718-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/0c08a5142c0c/peerj-cs-07-718-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/165629549708/peerj-cs-07-718-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e4c/8459783/898d03f8bb51/peerj-cs-07-718-g011.jpg

相似文献

1
Two-stage training algorithm for AI robot soccer.人工智能机器人足球的两阶段训练算法。
PeerJ Comput Sci. 2021 Sep 17;7:e718. doi: 10.7717/peerj-cs.718. eCollection 2021.
2
Decentralized multi-agent reinforcement learning based on best-response policies.基于最佳响应策略的分布式多智能体强化学习
Front Robot AI. 2024 Apr 16;11:1229026. doi: 10.3389/frobt.2024.1229026. eCollection 2024.
3
Learning agile soccer skills for a bipedal robot with deep reinforcement learning.使用深度强化学习为双足机器人学习敏捷的足球技能。
Sci Robot. 2024 Apr 10;9(89):eadi8022. doi: 10.1126/scirobotics.adi8022.
4
Table-Balancing Cooperative Robot Based on Deep Reinforcement Learning.基于深度强化学习的桌面平衡协作机器人
Sensors (Basel). 2023 May 31;23(11):5235. doi: 10.3390/s23115235.
5
Multi-robot task allocation in e-commerce RMFS based on deep reinforcement learning.基于深度强化学习的电子商务 RMFS 中的多机器人任务分配。
Math Biosci Eng. 2023 Jan;20(2):1903-1918. doi: 10.3934/mbe.2023087. Epub 2022 Nov 8.
6
MuDE: Multi-agent decomposed reward-based exploration.MuDE:基于多代理分解奖励的探索。
Neural Netw. 2024 Nov;179:106565. doi: 10.1016/j.neunet.2024.106565. Epub 2024 Jul 22.
7
Deep Q-network for social robotics using emotional social signals.利用情感社交信号的社交机器人深度Q网络。
Front Robot AI. 2022 Sep 26;9:880547. doi: 10.3389/frobt.2022.880547. eCollection 2022.
8
Intrinsically motivated reinforcement learning for human-robot interaction in the real-world.基于内在动机的强化学习在真实世界中的人机交互
Neural Netw. 2018 Nov;107:23-33. doi: 10.1016/j.neunet.2018.03.014. Epub 2018 Mar 26.
9
Research on deep reinforcement learning basketball robot shooting skills improvement based on end to end architecture and multi-modal perception.基于端到端架构和多模态感知的深度强化学习篮球机器人投篮技术改进研究
Front Neurorobot. 2023 Oct 13;17:1274543. doi: 10.3389/fnbot.2023.1274543. eCollection 2023.
10
Multi-Agent Deep Reinforcement Learning for Multi-Robot Applications: A Survey.多智能体深度强化学习在多机器人应用中的研究综述
Sensors (Basel). 2023 Mar 30;23(7):3625. doi: 10.3390/s23073625.

引用本文的文献

1
Clustering-based Failed goal Aware Hindsight Experience Replay.基于聚类的失败目标感知事后经验回放
PeerJ Comput Sci. 2024 Dec 12;10:e2588. doi: 10.7717/peerj-cs.2588. eCollection 2024.

本文引用的文献

1
Learning agile and dynamic motor skills for legged robots.学习用于腿部机器人的敏捷和动态运动技能。
Sci Robot. 2019 Jan 16;4(26). doi: 10.1126/scirobotics.aau5872.
2
Sampling Rate Decay in Hindsight Experience Replay for Robot Control.事后经验回放中机器人控制的采样率衰减。
IEEE Trans Cybern. 2022 Mar;52(3):1515-1526. doi: 10.1109/TCYB.2020.2990722. Epub 2022 Mar 11.
3
Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications.用于多智能体系统的深度强化学习:挑战、解决方案及应用综述
IEEE Trans Cybern. 2020 Sep;50(9):3826-3839. doi: 10.1109/TCYB.2020.2977374. Epub 2020 Mar 20.
4
Grandmaster level in StarCraft II using multi-agent reinforcement learning.星际争霸 II 中的大师级水平使用多智能体强化学习。
Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.
5
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.一种通过自我对弈掌握国际象棋、将棋和围棋的通用强化学习算法。
Science. 2018 Dec 7;362(6419):1140-1144. doi: 10.1126/science.aar6404.
6
Mastering the game of Go without human knowledge.无需人类知识即可掌握围棋游戏。
Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.
7
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
8
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
9
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.