O'Doherty John P, Cockburn Jeffrey, Pauli Wolfgang M
Division of Humanities and Social Sciences and Computation and Neural Systems Program, California Institute of Technology, Pasadena, California 91125; email:
Annu Rev Psychol. 2017 Jan 3;68:73-100. doi: 10.1146/annurev-psych-010416-044216. Epub 2016 Sep 28.
In this review, we summarize findings supporting the existence of multiple behavioral strategies for controlling reward-related behavior, including a dichotomy between the goal-directed or model-based system and the habitual or model-free system in the domain of instrumental conditioning and a similar dichotomy in the realm of Pavlovian conditioning. We evaluate evidence from neuroscience supporting the existence of at least partly distinct neuronal substrates contributing to the key computations necessary for the function of these different control systems. We consider the nature of the interactions between these systems and show how these interactions can lead to either adaptive or maladaptive behavioral outcomes. We then review evidence that an additional system guides inference concerning the hidden states of other agents, such as their beliefs, preferences, and intentions, in a social context. We also describe emerging evidence for an arbitration mechanism between model-based and model-free reinforcement learning, placing such a mechanism within the broader context of the hierarchical control of behavior.
在本综述中,我们总结了支持存在多种控制奖励相关行为的行为策略的研究结果,包括在工具性条件反射领域中目标导向或基于模型的系统与习惯性或无模型系统之间的二分法,以及在巴甫洛夫条件反射领域中类似的二分法。我们评估了神经科学的证据,这些证据支持存在至少部分不同的神经元基质,这些基质有助于这些不同控制系统功能所需的关键计算。我们考虑了这些系统之间相互作用的性质,并展示了这些相互作用如何导致适应性或适应不良的行为结果。然后,我们回顾了证据,表明另一个系统在社会背景下指导对其他主体隐藏状态的推断,例如他们的信念、偏好和意图。我们还描述了基于模型和无模型强化学习之间仲裁机制的新出现证据,并将这种机制置于行为分层控制的更广泛背景中。