Friston Karl, FitzGerald Thomas, Rigoli Francesco, Schwartenbeck Philipp, O Doherty John, Pezzulo Giovanni
The Wellcome Trust Centre for Neuroimaging, UCL, 12 Queen Square, London, United Kingdom.
The Wellcome Trust Centre for Neuroimaging, UCL, 12 Queen Square, London, United Kingdom; Max-PlanckUCL Centre for Computational Psychiatry and Ageing Research, London, United Kingdom.
Neurosci Biobehav Rev. 2016 Sep;68:862-879. doi: 10.1016/j.neubiorev.2016.06.022. Epub 2016 Jun 29.
This paper offers an active inference account of choice behaviour and learning. It focuses on the distinction between goal-directed and habitual behaviour and how they contextualise each other. We show that habits emerge naturally (and autodidactically) from sequential policy optimisation when agents are equipped with state-action policies. In active inference, behaviour has explorative (epistemic) and exploitative (pragmatic) aspects that are sensitive to ambiguity and risk respectively, where epistemic (ambiguity-resolving) behaviour enables pragmatic (reward-seeking) behaviour and the subsequent emergence of habits. Although goal-directed and habitual policies are usually associated with model-based and model-free schemes, we find the more important distinction is between belief-free and belief-based schemes. The underlying (variational) belief updating provides a comprehensive (if metaphorical) process theory for several phenomena, including the transfer of dopamine responses, reversal learning, habit formation and devaluation. Finally, we show that active inference reduces to a classical (Bellman) scheme, in the absence of ambiguity.
本文提供了一种关于选择行为和学习的主动推理解释。它着重于目标导向行为和习惯性行为之间的区别,以及它们如何相互关联。我们表明,当智能体配备状态-动作策略时,习惯会自然地(且自动地)从顺序策略优化中产生。在主动推理中,行为具有探索性(认知性)和利用性(实用性)两个方面,分别对模糊性和风险敏感,其中认知性(解决模糊性)行为促成实用性(寻求奖励)行为以及随后习惯的出现。尽管目标导向策略和习惯性策略通常与基于模型和无模型的方案相关联,但我们发现更重要的区别在于无信念和基于信念的方案之间。潜在的(变分)信念更新为包括多巴胺反应的传递、逆向学习、习惯形成和贬值在内的多种现象提供了一个全面的(如果是隐喻性的)过程理论。最后,我们表明在没有模糊性的情况下,主动推理简化为经典的(贝尔曼)方案。