IEEE Trans Cybern. 2013 Feb;43(1):385-93. doi: 10.1109/TSMCB.2012.2207951. Epub 2012 Aug 6.
This paper proposes a simple extension of the celebrated MINIMAX algorithm used in zero-sum two-player games, called Rminimax. The Rminimax algorithm allows controlling the strength of an artificial rival by randomizing its strategy in an optimal way. In particular, the randomized shortest-path framework is applied for biasing the artificial intelligence (AI) adversary toward worse or better solutions, therefore controlling its strength. In other words, our model aims at introducing/implementing bounded rationality to the MINIMAX algorithm. This framework takes into account all possible strategies by computing an optimal tradeoff between exploration (quantified by the entropy spread in the tree) and exploitation (quantified by the expected cost to an end game) of the game tree. As opposed to other tree-exploration techniques, this new algorithm considers complete paths of a tree (strategies) where a given entropy is spread. The optimal randomized strategy is efficiently computed by means of a simple recurrence relation while keeping the same complexity as the original MINIMAX. As a result, the Rminimax implements a nondeterministic strength-adapted AI opponent for board games in a principled way, thus avoiding the assumption of complete rationality. Simulations on two common games show that Rminimax behaves as expected.
本文提出了一种对著名的零和二人博弈 MINIMAX 算法的简单扩展,称为 Rminimax。Rminimax 算法通过以最优的方式随机化其策略,来控制人工对手的强度。具体来说,随机最短路径框架被用于通过向人工智能(AI)对手偏向更好或更差的解决方案来控制其强度,从而偏向于更差或更好的解决方案。换句话说,我们的模型旨在为 MINIMAX 算法引入/实现有限理性。该框架通过在博弈树的探索(由树中的熵扩散量化)和利用(由最终游戏的预期成本量化)之间进行最优权衡,考虑了所有可能的策略。与其他树探索技术不同,该新算法考虑了树中给定熵分布的完整路径(策略)。通过一个简单的递归关系来有效地计算最优随机策略,同时保持与原始 MINIMAX 相同的复杂性。结果,Rminimax 以一种有原则的方式为棋盘游戏实现了一个不确定强度自适应的 AI 对手,从而避免了完全理性的假设。对两个常见游戏的模拟表明,Rminimax 的表现符合预期。