• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AlphaDDA:将完全训练好的AlphaZero系统的游戏强度调整到适合人类训练伙伴的策略。

AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner.

作者信息

Fujita Kazuhisa

机构信息

Komatsu University, Komatsu, Ishikawa, Japan.

University of Electro-Communications, Chofu, Tokyo, Japan.

出版信息

PeerJ Comput Sci. 2022 Oct 4;8:e1123. doi: 10.7717/peerj-cs.1123. eCollection 2022.

DOI:10.7717/peerj-cs.1123
PMID:36262155
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9575865/
Abstract

Artificial intelligence (AI) has achieved superhuman performance in board games such as Go, chess, and Othello (Reversi). In other words, the AI system surpasses the level of a strong human expert player in such games. In this context, it is difficult for a human player to enjoy playing the games with the AI. To keep human players entertained and immersed in a game, the AI is required to dynamically balance its skill with that of the human player. To address this issue, we propose AlphaDDA, an AlphaZero-based AI with dynamic difficulty adjustment (DDA). AlphaDDA consists of a deep neural network (DNN) and a Monte Carlo tree search, as in AlphaZero. AlphaDDA learns and plays a game the same way as AlphaZero, but can change its skills. AlphaDDA estimates the value of the game state from only the board state using the DNN. AlphaDDA changes a parameter dominantly controlling its skills according to the estimated value. Consequently, AlphaDDA adjusts its skills according to a game state. AlphaDDA can adjust its skill using only the state of a game without any prior knowledge regarding an opponent. In this study, AlphaDDA plays Connect4, Othello, and 6x6 Othello with other AI agents. Other AI agents are AlphaZero, Monte Carlo tree search, the minimax algorithm, and a random player. This study shows that AlphaDDA can balance its skill with that of the other AI agents, except for a random player. AlphaDDA can weaken itself according to the estimated value. However, AlphaDDA beats the random player because AlphaDDA is stronger than a random player even if AlphaDDA weakens itself to the limit. The DDA ability of AlphaDDA is based on an accurate estimation of the value from the state of a game. We believe that the AlphaDDA approach for DDA can be used for any game AI system if the DNN can accurately estimate the value of the game state and we know a parameter controlling the skills of the AI system.

摘要

人工智能(AI)在围棋、国际象棋和黑白棋(翻转棋)等棋盘游戏中已实现超越人类的表现。换句话说,人工智能系统在这类游戏中超越了人类顶尖专业棋手的水平。在这种情况下,人类玩家很难与人工智能一起享受游戏的乐趣。为了让人类玩家保持娱乐并沉浸在游戏中,需要人工智能动态平衡其与人类玩家的技能水平。为了解决这个问题,我们提出了AlphaDDA,一种基于AlphaZero且具有动态难度调整(DDA)功能的人工智能。与AlphaZero一样,AlphaDDA由一个深度神经网络(DNN)和蒙特卡洛树搜索组成。AlphaDDA学习和玩游戏的方式与AlphaZero相同,但可以改变其技能。AlphaDDA仅使用DNN根据棋盘状态估计游戏状态的值。AlphaDDA根据估计值改变一个主要控制其技能的参数。因此,AlphaDDA根据游戏状态调整其技能。AlphaDDA仅使用游戏状态就能调整其技能,而无需任何关于对手的先验知识。在本研究中,AlphaDDA与其他人工智能代理进行四子棋、黑白棋和6x6黑白棋游戏。其他人工智能代理包括AlphaZero、蒙特卡洛树搜索、极小极大算法和随机玩家。这项研究表明,除了随机玩家外,AlphaDDA可以平衡其与其他人工智能代理的技能水平。AlphaDDA可以根据估计值削弱自身。然而,AlphaDDA能击败随机玩家,因为即使AlphaDDA将自身削弱到极限,它仍比随机玩家更强。AlphaDDA的DDA能力基于对游戏状态值的准确估计。我们相信,如果DNN能够准确估计游戏状态的值,并且我们知道控制人工智能系统技能的参数,那么用于DDA的AlphaDDA方法可用于任何游戏人工智能系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/26446555d00c/peerj-cs-08-1123-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/1a84ba34bc83/peerj-cs-08-1123-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/6ade5fea16fd/peerj-cs-08-1123-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/63400b923c90/peerj-cs-08-1123-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/b58b50fa36b9/peerj-cs-08-1123-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/9c3cd6df78cc/peerj-cs-08-1123-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/755109f20b62/peerj-cs-08-1123-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/6ed0d031db42/peerj-cs-08-1123-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/83a701428c21/peerj-cs-08-1123-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/b8b56a362e7b/peerj-cs-08-1123-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/26446555d00c/peerj-cs-08-1123-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/1a84ba34bc83/peerj-cs-08-1123-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/6ade5fea16fd/peerj-cs-08-1123-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/63400b923c90/peerj-cs-08-1123-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/b58b50fa36b9/peerj-cs-08-1123-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/9c3cd6df78cc/peerj-cs-08-1123-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/755109f20b62/peerj-cs-08-1123-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/6ed0d031db42/peerj-cs-08-1123-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/83a701428c21/peerj-cs-08-1123-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/b8b56a362e7b/peerj-cs-08-1123-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/26446555d00c/peerj-cs-08-1123-g010.jpg

相似文献

1
AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner.AlphaDDA:将完全训练好的AlphaZero系统的游戏强度调整到适合人类训练伙伴的策略。
PeerJ Comput Sci. 2022 Oct 4;8:e1123. doi: 10.7717/peerj-cs.1123. eCollection 2022.
2
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.一种通过自我对弈掌握国际象棋、将棋和围棋的通用强化学习算法。
Science. 2018 Dec 7;362(6419):1140-1144. doi: 10.1126/science.aar6404.
3
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
4
Mastering Atari, Go, chess and shogi by planning with a learned model.通过使用学习模型进行规划,掌握 Atari、围棋、国际象棋和将棋。
Nature. 2020 Dec;588(7839):604-609. doi: 10.1038/s41586-020-03051-4. Epub 2020 Dec 23.
5
Difficulty-skill balance does not affect engagement and enjoyment: a pre-registered study using artificial intelligence-controlled difficulty.难度-技能平衡不影响参与度和乐趣:一项使用人工智能控制难度的预注册研究。
R Soc Open Sci. 2023 Feb 1;10(2):220274. doi: 10.1098/rsos.220274. eCollection 2023 Feb.
6
Rminimax: An Optimally Randomized MINIMAX Algorithm.Rminimax:一种最优随机化 MINIMAX 算法。
IEEE Trans Cybern. 2013 Feb;43(1):385-93. doi: 10.1109/TSMCB.2012.2207951. Epub 2012 Aug 6.
7
Perceptual skill in the game of Othello.黑白棋游戏中的感知技能。
J Psychol. 1984 Sep;118(1ST Half):7-16. doi: 10.1080/00223980.1984.9712586.
8
Acquisition of chess knowledge in AlphaZero.阿尔法零(AlphaZero)获取国际象棋知识。
Proc Natl Acad Sci U S A. 2022 Nov 22;119(47):e2206625119. doi: 10.1073/pnas.2206625119. Epub 2022 Nov 14.
9
AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong.AlphaZe∗∗:用于不完全信息博弈的类似AlphaZero的基线方法出奇地强大。
Front Artif Intell. 2023 May 12;6:1014561. doi: 10.3389/frai.2023.1014561. eCollection 2023.
10
Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data.利用深度神经网络和人类数据,学习玩国际象棋变体“疯狂之家”并超越世界冠军水平。
Front Artif Intell. 2020 Apr 28;3:24. doi: 10.3389/frai.2020.00024. eCollection 2020.

本文引用的文献

1
Mastering Atari, Go, chess and shogi by planning with a learned model.通过使用学习模型进行规划,掌握 Atari、围棋、国际象棋和将棋。
Nature. 2020 Dec;588(7839):604-609. doi: 10.1038/s41586-020-03051-4. Epub 2020 Dec 23.
2
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.一种通过自我对弈掌握国际象棋、将棋和围棋的通用强化学习算法。
Science. 2018 Dec 7;362(6419):1140-1144. doi: 10.1126/science.aar6404.
3
Mastering the game of Go without human knowledge.无需人类知识即可掌握围棋游戏。
Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.
4
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.