Department of Experimental Psychology, University of Oxford, Radcliffe Observatory Quarter, Woodstock Road, Oxford, UK.
Department of Experimental Psychology, University of Oxford, Radcliffe Observatory Quarter, Woodstock Road, Oxford, UK.
Trends Cogn Sci. 2019 Oct;23(10):836-850. doi: 10.1016/j.tics.2019.07.012. Epub 2019 Sep 4.
The computational framework of reinforcement learning (RL) has allowed us to both understand biological brains and build successful artificial agents. However, in this opinion, we highlight open challenges for RL as a model of animal behaviour in natural environments. We ask how the external reward function is designed for biological systems, and how we can account for the context sensitivity of valuation. We summarise both old and new theories proposing that animals track current and desired internal states and seek to minimise the distance to a goal across multiple value dimensions. We suggest that this framework readily accounts for canonical phenomena observed in the fields of psychology, behavioural ecology, and economics, and recent findings from brain-imaging studies of value-guided decision-making.
强化学习(RL)的计算框架使我们既能理解生物大脑,又能构建成功的人工智能。然而,在这篇观点文章中,我们强调了 RL 作为自然环境中动物行为模型的一些开放性挑战。我们提出了如何为生物系统设计外部奖励函数,以及如何解释估值的上下文敏感性。我们总结了一些旧的和新的理论,这些理论提出动物会跟踪当前和期望的内部状态,并试图在多个价值维度上最小化到目标的距离。我们认为,这个框架可以很好地解释心理学、行为生态学和经济学领域以及最近的价值导向决策的大脑成像研究中的典型现象。