Suppr超能文献

用深度神经网络和树搜索掌握围棋游戏。

Mastering the game of Go with deep neural networks and tree search.

机构信息

Google DeepMind, 5 New Street Square, London EC4A 3TW, UK.

Google, 1600 Amphitheatre Parkway, Mountain View, California 94043, USA.

出版信息

Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.

Abstract

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and 'policy networks' to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

摘要

围棋一直被视为人工智能领域最具挑战性的经典游戏之一,因为其搜索空间巨大,而且很难评估棋盘位置和走法。在这里,我们引入了一种新的围棋计算机程序,它使用“价值网络”来评估棋盘位置,使用“策略网络”来选择走法。这些深度神经网络是通过结合人类专家游戏的监督学习和自我对弈的强化学习进行训练的。无需任何展望搜索,神经网络就可以达到模拟数千次自我对弈的最新蒙特卡洛树搜索程序的水平。我们还引入了一种新的搜索算法,将蒙特卡洛模拟与价值和策略网络相结合。使用这种搜索算法,我们的程序 AlphaGo 对其他围棋程序的胜率达到了 99.8%,并以 5 比 0 的比分击败了欧洲围棋冠军。这是计算机程序首次在完整的围棋比赛中击败人类职业选手,此前人们认为至少需要十年时间才能实现这一壮举。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验