Suppr超能文献

AlphaDDA:将完全训练好的AlphaZero系统的游戏强度调整到适合人类训练伙伴的策略。

AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner.

作者信息

Fujita Kazuhisa

机构信息

Komatsu University, Komatsu, Ishikawa, Japan.

University of Electro-Communications, Chofu, Tokyo, Japan.

出版信息

PeerJ Comput Sci. 2022 Oct 4;8:e1123. doi: 10.7717/peerj-cs.1123. eCollection 2022.

Abstract

Artificial intelligence (AI) has achieved superhuman performance in board games such as Go, chess, and Othello (Reversi). In other words, the AI system surpasses the level of a strong human expert player in such games. In this context, it is difficult for a human player to enjoy playing the games with the AI. To keep human players entertained and immersed in a game, the AI is required to dynamically balance its skill with that of the human player. To address this issue, we propose AlphaDDA, an AlphaZero-based AI with dynamic difficulty adjustment (DDA). AlphaDDA consists of a deep neural network (DNN) and a Monte Carlo tree search, as in AlphaZero. AlphaDDA learns and plays a game the same way as AlphaZero, but can change its skills. AlphaDDA estimates the value of the game state from only the board state using the DNN. AlphaDDA changes a parameter dominantly controlling its skills according to the estimated value. Consequently, AlphaDDA adjusts its skills according to a game state. AlphaDDA can adjust its skill using only the state of a game without any prior knowledge regarding an opponent. In this study, AlphaDDA plays Connect4, Othello, and 6x6 Othello with other AI agents. Other AI agents are AlphaZero, Monte Carlo tree search, the minimax algorithm, and a random player. This study shows that AlphaDDA can balance its skill with that of the other AI agents, except for a random player. AlphaDDA can weaken itself according to the estimated value. However, AlphaDDA beats the random player because AlphaDDA is stronger than a random player even if AlphaDDA weakens itself to the limit. The DDA ability of AlphaDDA is based on an accurate estimation of the value from the state of a game. We believe that the AlphaDDA approach for DDA can be used for any game AI system if the DNN can accurately estimate the value of the game state and we know a parameter controlling the skills of the AI system.

摘要

人工智能(AI)在围棋、国际象棋和黑白棋(翻转棋)等棋盘游戏中已实现超越人类的表现。换句话说,人工智能系统在这类游戏中超越了人类顶尖专业棋手的水平。在这种情况下,人类玩家很难与人工智能一起享受游戏的乐趣。为了让人类玩家保持娱乐并沉浸在游戏中,需要人工智能动态平衡其与人类玩家的技能水平。为了解决这个问题,我们提出了AlphaDDA,一种基于AlphaZero且具有动态难度调整(DDA)功能的人工智能。与AlphaZero一样,AlphaDDA由一个深度神经网络(DNN)和蒙特卡洛树搜索组成。AlphaDDA学习和玩游戏的方式与AlphaZero相同,但可以改变其技能。AlphaDDA仅使用DNN根据棋盘状态估计游戏状态的值。AlphaDDA根据估计值改变一个主要控制其技能的参数。因此,AlphaDDA根据游戏状态调整其技能。AlphaDDA仅使用游戏状态就能调整其技能,而无需任何关于对手的先验知识。在本研究中,AlphaDDA与其他人工智能代理进行四子棋、黑白棋和6x6黑白棋游戏。其他人工智能代理包括AlphaZero、蒙特卡洛树搜索、极小极大算法和随机玩家。这项研究表明,除了随机玩家外,AlphaDDA可以平衡其与其他人工智能代理的技能水平。AlphaDDA可以根据估计值削弱自身。然而,AlphaDDA能击败随机玩家,因为即使AlphaDDA将自身削弱到极限,它仍比随机玩家更强。AlphaDDA的DDA能力基于对游戏状态值的准确估计。我们相信,如果DNN能够准确估计游戏状态的值,并且我们知道控制人工智能系统技能的参数,那么用于DDA的AlphaDDA方法可用于任何游戏人工智能系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a7ec/9575865/1a84ba34bc83/peerj-cs-08-1123-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验