Suppr超能文献

导航复杂决策空间:序列选择中的问题和范式。

Navigating complex decision spaces: Problems and paradigms in sequential choice.

机构信息

Air Force Research Laboratory, Wright-Patterson Air Force Base.

Department of Psychology, Carnegie Mellon University.

出版信息

Psychol Bull. 2014 Mar;140(2):466-86. doi: 10.1037/a0033455. Epub 2013 Jul 8.

Abstract

To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides 2 general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes, cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior but they also provide a useful framework for understanding neural reward valuation and action selection.

摘要

为了适应环境,我们必须从行为的后果中学习。当行为的后果存在延迟时,这样做是很困难的。这就引入了时间信用分配问题。当反馈跟随一系列决策时,个体应该如何将信用分配给构成序列的中间动作?强化学习的研究为这个问题提供了 2 种一般的解决方案:无模型强化学习和基于模型的强化学习。在这篇综述中,我们考察了刺激-反应和认知学习理论、习惯和目标导向控制、无模型和基于模型的强化学习之间的联系。然后,我们考虑了一系列与时间信用分配相关的问题。这些问题包括二阶条件作用和二级强化物、潜在学习和迂回行为、部分可观察的马尔可夫决策过程、分布式结果的动作以及层次学习。我们想知道当人类和动物面对这些问题时,它们的行为是否符合强化学习技术。在整个过程中,我们试图确定无模型和基于模型的强化学习的神经基质。前一类技术可以用神经递质多巴胺及其在基底神经节中的作用来理解。后者可以用包括前额叶皮层、内侧颞叶、小脑和基底神经节在内的区域的分布式网络来理解。强化学习技术不仅可以根据人类和动物的行为来进行自然的解释,而且还为理解神经奖励评估和动作选择提供了一个有用的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1912/4309984/d1c8093dbd16/nihms525346f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验