无需人类知识即可掌握围棋游戏。

Mastering the game of Go without human knowledge.

机构信息

DeepMind, 5 New Street Square, London EC4A 3TW, UK.

出版信息

Nature. 2017 Oct 18;550(7676):354-359. doi: 10.1038/nature24270.

Abstract

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo.

摘要

人工智能的一个长期目标是开发一种算法，使其能够从零开始，在具有挑战性的领域中达到超人的专业水平。最近，AlphaGo 成为第一个在围棋游戏中击败世界冠军的程序。AlphaGo 的树搜索使用深度神经网络评估位置并选择走法。这些神经网络通过监督学习从人类专家走法和自我对弈的强化学习进行训练。在这里，我们介绍一种完全基于强化学习的算法，不使用人类数据，除了游戏规则之外，也没有任何领域知识或指导。AlphaGo 成为了自己的老师：一个神经网络被训练来预测 AlphaGo 自己的走法选择，以及 AlphaGo 游戏的获胜者。这个神经网络改进了树搜索的强度，导致在下一轮迭代中走法选择质量更高，自我对弈更强。从零基础开始，我们的新程序 AlphaGo Zero 取得了超人的表现，以 100-0 战胜了之前发布的、击败冠军的 AlphaGo。