导航复杂决策空间：序列选择中的问题和范式。

Navigating complex decision spaces: Problems and paradigms in sequential choice.

机构信息

Air Force Research Laboratory, Wright-Patterson Air Force Base.

Department of Psychology, Carnegie Mellon University.

出版信息

Psychol Bull. 2014 Mar;140(2):466-86. doi: 10.1037/a0033455. Epub 2013 Jul 8.

DOI:10.1037/a0033455

PMID:23834192

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4309984/

Abstract

To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult when the consequences of an action follow a delay. This introduces the problem of temporal credit assignment. When feedback follows a sequence of decisions, how should the individual assign credit to the intermediate actions that comprise the sequence? Research in reinforcement learning provides 2 general solutions to this problem: model-free reinforcement learning and model-based reinforcement learning. In this review, we examine connections between stimulus-response and cognitive learning theories, habitual and goal-directed control, and model-free and model-based reinforcement learning. We then consider a range of problems related to temporal credit assignment. These include second-order conditioning and secondary reinforcers, latent learning and detour behavior, partially observable Markov decision processes, actions with distributed outcomes, and hierarchical learning. We ask whether humans and animals, when faced with these problems, behave in a manner consistent with reinforcement learning techniques. Throughout, we seek to identify neural substrates of model-free and model-based reinforcement learning. The former class of techniques is understood in terms of the neurotransmitter dopamine and its effects in the basal ganglia. The latter is understood in terms of a distributed network of regions including the prefrontal cortex, medial temporal lobes, cerebellum, and basal ganglia. Not only do reinforcement learning techniques have a natural interpretation in terms of human and animal behavior but they also provide a useful framework for understanding neural reward valuation and action selection.

摘要

为了适应环境，我们必须从行为的后果中学习。当行为的后果存在延迟时，这样做是很困难的。这就引入了时间信用分配问题。当反馈跟随一系列决策时，个体应该如何将信用分配给构成序列的中间动作？强化学习的研究为这个问题提供了 2 种一般的解决方案：无模型强化学习和基于模型的强化学习。在这篇综述中，我们考察了刺激-反应和认知学习理论、习惯和目标导向控制、无模型和基于模型的强化学习之间的联系。然后，我们考虑了一系列与时间信用分配相关的问题。这些问题包括二阶条件作用和二级强化物、潜在学习和迂回行为、部分可观察的马尔可夫决策过程、分布式结果的动作以及层次学习。我们想知道当人类和动物面对这些问题时，它们的行为是否符合强化学习技术。在整个过程中，我们试图确定无模型和基于模型的强化学习的神经基质。前一类技术可以用神经递质多巴胺及其在基底神经节中的作用来理解。后者可以用包括前额叶皮层、内侧颞叶、小脑和基底神经节在内的区域的分布式网络来理解。强化学习技术不仅可以根据人类和动物的行为来进行自然的解释，而且还为理解神经奖励评估和动作选择提供了一个有用的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1912/4309984/d1c8093dbd16/nihms525346f1.jpg

相似文献

Navigating complex decision spaces: Problems and paradigms in sequential choice.导航复杂决策空间：序列选择中的问题和范式。

Psychol Bull. 2014 Mar;140(2):466-86. doi: 10.1037/a0033455. Epub 2013 Jul 8.

Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。

Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.

Credit Assignment in a Motor Decision Making Task Is Influenced by Agency and Not Sensory Prediction Errors.在一项运动决策任务中，信用分配受机构影响，而不受感官预测误差影响。

J Neurosci. 2018 May 9;38(19):4521-4530. doi: 10.1523/JNEUROSCI.3601-17.2018. Epub 2018 Apr 12.

Speed/accuracy trade-off between the habitual and the goal-directed processes.习惯与目标导向过程之间的速度/准确性权衡。

PLoS Comput Biol. 2011 May;7(5):e1002055. doi: 10.1371/journal.pcbi.1002055. Epub 2011 May 26.

A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task.一种具有类似多巴胺强化信号的神经网络模型，用于学习空间延迟反应任务。

Neuroscience. 1999;91(3):871-90. doi: 10.1016/s0306-4522(98)00697-6.

Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning.多巴胺基因的变异性使基于模型和无模型的强化学习产生分离。

J Neurosci. 2016 Jan 27;36(4):1211-22. doi: 10.1523/JNEUROSCI.1901-15.2016.

Multiple memory systems as substrates for multiple decision systems.多种记忆系统作为多种决策系统的基础。

Neurobiol Learn Mem. 2015 Jan;117:4-13. doi: 10.1016/j.nlm.2014.04.014. Epub 2014 May 15.

Instructional control of reinforcement learning: a behavioral and neurocomputational investigation.强化学习的指令控制：一项行为与神经计算研究。

Brain Res. 2009 Nov 24;1299:74-94. doi: 10.1016/j.brainres.2009.07.007. Epub 2009 Aug 3.

Spatio-temporal credit assignment in neuronal population learning.神经元群体学习中的时空信用分配。

PLoS Comput Biol. 2011 Jun;7(6):e1002092. doi: 10.1371/journal.pcbi.1002092. Epub 2011 Jun 30.

Dynamical model of salience gated working memory, action selection and reinforcement based on basal ganglia and dopamine feedback.基于基底神经节和多巴胺反馈的显著性门控工作记忆、动作选择与强化的动力学模型。

Neural Netw. 2008 Mar-Apr;21(2-3):322-30. doi: 10.1016/j.neunet.2007.12.040. Epub 2007 Dec 31.

引用本文的文献

State-transition-free reinforcement learning in chimpanzees (Pan troglodytes).黑猩猩（Pan troglodytes）中无状态转换的强化学习。

Learn Behav. 2023 Dec;51(4):413-427. doi: 10.3758/s13420-023-00591-3. Epub 2023 Jun 27.

Impulsive reactivity to emotion and vulnerability to psychopathology.冲动反应性与情绪和精神病理学易感性。

Am Psychol. 2018 Dec;73(9):1067-1078. doi: 10.1037/amp0000387.

Solving the Credit Assignment Problem With the Prefrontal Cortex.利用前额叶皮层解决信用分配问题。

Front Neurosci. 2018 Mar 27;12:182. doi: 10.3389/fnins.2018.00182. eCollection 2018.

Computational model for behavior shaping as an adaptive health intervention strategy.行为塑造的计算模型作为一种自适应健康干预策略。

Transl Behav Med. 2018 Mar 1;8(2):183-194. doi: 10.1093/tbm/ibx049.

Toward a Functional View of the P Factor in Psychopathology.迈向心理病理学中P因子的功能观

Clin Psychol Sci. 2017;5(5):880-889. doi: 10.1177/2167702617710037. Epub 2017 Jun 11.

Modeling Search Behaviors during the Acquisition of Expertise in a Sequential Decision-Making Task.在顺序决策任务中获取专业知识过程中的搜索行为建模

Front Comput Neurosci. 2017 Sep 8;11:80. doi: 10.3389/fncom.2017.00080. eCollection 2017.

VTA neurons coordinate with the hippocampal reactivation of spatial experience.腹侧被盖区神经元与空间体验的海马再激活相互协调。

Elife. 2015 Oct 14;4:e05360. doi: 10.7554/eLife.05360.

Dissociable functions of reward inference in the lateral prefrontal cortex and the striatum.外侧前额叶皮层和纹状体中奖励推理的可分离功能。

Front Psychol. 2015 Jul 16;6:995. doi: 10.3389/fpsyg.2015.00995. eCollection 2015.

Electrophysiological responses to feedback during the application of abstract rules.在应用抽象规则时对反馈的电生理反应。

J Cogn Neurosci. 2013 Nov;25(11):1986-2002. doi: 10.1162/jocn_a_00454. Epub 2013 Aug 5.

本文引用的文献

J Cogn Neurosci. 1997 Nov;9(6):788-98. doi: 10.1162/jocn.1997.9.6.788.

The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.计划的诅咒：通过征税中央执行系统来剖析多个强化学习系统。

Psychol Sci. 2013 May;24(5):751-61. doi: 10.1177/0956797612463080. Epub 2013 Apr 4.

Retrospective revaluation in sequential decision making: a tale of two systems.序贯决策中的回溯再评估：两个系统的故事。

J Exp Psychol Gen. 2014 Feb;143(1):182-94. doi: 10.1037/a0030844. Epub 2012 Dec 10.

Melioration as rational choice: sequential decision making in uncertain environments.改善作为理性选择：不确定环境下的序贯决策。

Psychol Rev. 2013 Jan;120(1):139-54. doi: 10.1037/a0030850. Epub 2012 Dec 10.

Exploring a latent cause theory of classical conditioning.探索经典条件作用的潜在原因理论。

Learn Behav. 2012 Sep;40(3):255-68. doi: 10.3758/s13420-012-0080-8.

Learning from experience: event-related potential correlates of reward processing, neural adaptation, and behavioral choice.从经验中学习：与奖励处理、神经适应和行为选择相关的事件相关电位。

Neurosci Biobehav Rev. 2012 Sep;36(8):1870-84. doi: 10.1016/j.neubiorev.2012.05.008. Epub 2012 Jun 7.

Instrumental vigour in punishment and reward.奖惩的工具性效力。

Eur J Neurosci. 2012 Apr;35(7):1152-68. doi: 10.1111/j.1460-9568.2012.08026.x.

Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees.大脑中的盆景树：巴甫洛夫系统如何通过修剪决策树来塑造目标导向的选择。

PLoS Comput Biol. 2012;8(3):e1002410. doi: 10.1371/journal.pcbi.1002410. Epub 2012 Mar 8.

Mapping value based planning and extensively trained choice in the human brain.在人类大脑中映射基于价值的规划和广泛训练的选择。

Nat Neurosci. 2012 Mar 11;15(5):786-91. doi: 10.1038/nn.3068.

Phasic mesolimbic dopamine signaling precedes and predicts performance of a self-initiated action sequence task.阶段性中脑边缘多巴胺信号先于并预测了自我启动动作序列任务的表现。

Biol Psychiatry. 2012 May 15;71(10):846-54. doi: 10.1016/j.biopsych.2011.12.019. Epub 2012 Feb 2.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

导航复杂决策空间：序列选择中的问题和范式。

Navigating complex decision spaces: Problems and paradigms in sequential choice.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献