Lee Daeyeol, McGreevy Benjamin P, Barraclough Dominic J
Department of Brain and Cognitive Sciences, Center for Visual Science, University of Rochester, Rochester, NY 14627, USA.
Brain Res Cogn Brain Res. 2005 Oct;25(2):416-30. doi: 10.1016/j.cogbrainres.2005.07.003. Epub 2005 Aug 10.
Game theory provides a solution to the problem of finding a set of optimal decision-making strategies in a group. However, people seldom play such optimal strategies and adjust their strategies based on their experience. Accordingly, many theories postulate a set of variables related to the probabilities of choosing various strategies and describe how such variables are dynamically updated. In reinforcement learning, these value functions are updated based on the outcome of the player's choice, whereas belief learning allows the value functions of all available choices to be updated according to the choices of other players. We investigated the nature of learning process in monkeys playing a competitive game with ternary choices, using a rock-paper-scissors game. During the baseline condition in which the computer selected its targets randomly, each animal displayed biases towards some targets. When the computer exploited the pattern of animal's choice sequence but not its reward history, the animal's choice was still systematically biased by the previous choice of the computer. This bias was reduced when the computer exploited both the choice and reward histories of the animal. Compared to simple models of reinforcement learning or belief learning, these adaptive processes were better described by a model that incorporated the features of both models. These results suggest that stochastic decision-making strategies in primates during social interactions might be adjusted according to both actual and hypothetical payoffs.
博弈论为在群体中寻找一组最优决策策略的问题提供了解决方案。然而,人们很少采用这种最优策略,而是根据自身经验来调整策略。因此,许多理论假定了一组与选择各种策略的概率相关的变量,并描述了这些变量如何动态更新。在强化学习中,这些价值函数根据玩家选择的结果进行更新,而信念学习则允许根据其他玩家的选择来更新所有可用选择的价值函数。我们使用石头剪刀布游戏,研究了猴子在进行三选一竞争游戏时学习过程的本质。在计算机随机选择目标的基线条件下,每只动物对某些目标都表现出偏好。当计算机利用动物选择序列的模式而非其奖励历史时,动物的选择仍然会被计算机先前的选择系统性地影响。当计算机同时利用动物的选择和奖励历史时,这种偏差会减小。与强化学习或信念学习的简单模型相比,这些适应性过程通过一个结合了两种模型特征的模型能得到更好的描述。这些结果表明,灵长类动物在社交互动中的随机决策策略可能会根据实际和假设的收益进行调整。