基于模型的树搜索中的证据整合。

Evidence integration in model-based tree search.

作者信息

Solway Alec, Botvinick Matthew M

机构信息

Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544;

Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544; Department of Psychology, Princeton University, Princeton, NJ 08544; Google DeepMind, London EC4A 3TW, United Kingdom.

出版信息

Proc Natl Acad Sci U S A. 2015 Sep 15;112(37):11708-13. doi: 10.1073/pnas.1505483112. Epub 2015 Aug 31.

DOI:10.1073/pnas.1505483112

PMID:26324932

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4577209/

Abstract

Research on the dynamics of reward-based, goal-directed decision making has largely focused on simple choice, where participants decide among a set of unitary, mutually exclusive options. Recent work suggests that the deliberation process underlying simple choice can be understood in terms of evidence integration: Noisy evidence in favor of each option accrues over time, until the evidence in favor of one option is significantly greater than the rest. However, real-life decisions often involve not one, but several steps of action, requiring a consideration of cumulative rewards and a sensitivity to recursive decision structure. We present results from two experiments that leveraged techniques previously applied to simple choice to shed light on the deliberation process underlying multistep choice. We interpret the results from these experiments in terms of a new computational model, which extends the evidence accumulation perspective to multiple steps of action.

摘要

基于奖励的目标导向决策动态研究主要集中在简单选择上，即参与者在一组单一、相互排斥的选项中进行决策。最近的研究表明，简单选择背后的审议过程可以通过证据整合来理解：支持每个选项的嘈杂证据会随着时间积累，直到支持一个选项的证据明显大于其他选项。然而，现实生活中的决策通常涉及不止一个行动步骤，需要考虑累积奖励并对递归决策结构保持敏感。我们展示了两项实验的结果，这些实验利用了先前应用于简单选择的技术，以阐明多步选择背后的审议过程。我们根据一个新的计算模型来解释这些实验的结果，该模型将证据积累的观点扩展到多个行动步骤。

相似文献

Evidence integration in model-based tree search.基于模型的树搜索中的证据整合。

Proc Natl Acad Sci U S A. 2015 Sep 15;112(37):11708-13. doi: 10.1073/pnas.1505483112. Epub 2015 Aug 31.

Normative decision rules in changing environments.规范决策规则在不断变化的环境中。

Elife. 2022 Oct 25;11:e79824. doi: 10.7554/eLife.79824.

Statistical mechanics of reward-modulated learning in decision-making networks.决策网络中受奖励调节的学习的统计力学。

Neural Comput. 2012 May;24(5):1230-70. doi: 10.1162/NECO_a_00264. Epub 2012 Feb 1.

How pupil responses track value-based decision-making during and after reinforcement learning.瞳孔反应如何在强化学习期间和之后跟踪基于价值的决策。

PLoS Comput Biol. 2018 Nov 30;14(11):e1006632. doi: 10.1371/journal.pcbi.1006632. eCollection 2018 Nov.

How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.我们如何学习做决策：强化学习预测错误在人类中的快速传播。

J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.

The drift diffusion model as the choice rule in reinforcement learning.强化学习中的选择规则——漂移扩散模型。

Psychon Bull Rev. 2017 Aug;24(4):1234-1251. doi: 10.3758/s13423-016-1199-y.

Benchmarking for Bayesian Reinforcement Learning.贝叶斯强化学习的基准测试

PLoS One. 2016 Jun 15;11(6):e0157088. doi: 10.1371/journal.pone.0157088. eCollection 2016.

Sensorimotor learning biases choice behavior: a learning neural field model for decision making.感觉运动学习偏向选择行为：决策的学习神经场模型。

PLoS Comput Biol. 2012;8(11):e1002774. doi: 10.1371/journal.pcbi.1002774. Epub 2012 Nov 15.

Model-based reinforcement learning under concurrent schedules of reinforcement in rodents.啮齿动物在并发强化程序下基于模型的强化学习

Learn Mem. 2009 Apr 29;16(5):315-23. doi: 10.1101/lm.1295509. Print 2009 May.

Learning the opportunity cost of time in a patch-foraging task.在斑块觅食任务中了解时间的机会成本。

Cogn Affect Behav Neurosci. 2015 Dec;15(4):837-53. doi: 10.3758/s13415-015-0350-y.

引用本文的文献

Disentangling the Component Processes in Complex Planning Impairments Following Ventromedial Prefrontal Lesions.解析腹内侧前额叶损伤后复杂规划障碍中的组成过程

J Neurosci. 2025 Mar 19;45(12):e1814242025. doi: 10.1523/JNEUROSCI.1814-24.2025.

A low-dimensional approximation of optimal confidence.最优置信的低维逼近。

PLoS Comput Biol. 2024 Jul 24;20(7):e1012273. doi: 10.1371/journal.pcbi.1012273. eCollection 2024 Jul.

Transitions in cognitive evolution.认知进化的转变。

Proc Biol Sci. 2023 Jul 12;290(2002):20230671. doi: 10.1098/rspb.2023.0671. Epub 2023 Jul 5.

Informational Entropy Threshold as a Physical Mechanism for Explaining Tree-like Decision Making in Humans.信息熵阈值作为解释人类树状决策的一种物理机制

Entropy (Basel). 2022 Dec 13;24(12):1819. doi: 10.3390/e24121819.

Neural Mechanisms That Make Perceptual Decisions Flexible.使知觉决策灵活的神经机制。

Annu Rev Physiol. 2023 Feb 10;85:191-215. doi: 10.1146/annurev-physiol-031722-024731. Epub 2022 Nov 7.

A weighted constraint satisfaction approach to human goal-directed decision making.一种加权约束满足方法用于人类目标导向决策。

PLoS Comput Biol. 2022 Jun 16;18(6):e1009553. doi: 10.1371/journal.pcbi.1009553. eCollection 2022 Jun.

Conflict and competition between model-based and model-free control.基于模型和无模型控制之间的冲突和竞争。

PLoS Comput Biol. 2022 May 5;18(5):e1010047. doi: 10.1371/journal.pcbi.1010047. eCollection 2022 May.

Rational use of cognitive resources in human planning.人类规划中的认知资源的合理利用。

Nat Hum Behav. 2022 Aug;6(8):1112-1125. doi: 10.1038/s41562-022-01332-8. Epub 2022 Apr 28.

Decision prioritization and causal reasoning in decision hierarchies.决策层次中的决策优先级排序和因果推理。

PLoS Comput Biol. 2021 Dec 31;17(12):e1009688. doi: 10.1371/journal.pcbi.1009688. eCollection 2021 Dec.

Advances in modeling learning and decision-making in neuroscience.神经科学中学习和决策建模的进展。

Neuropsychopharmacology. 2022 Jan;47(1):104-118. doi: 10.1038/s41386-021-01126-y. Epub 2021 Aug 27.

本文引用的文献

Interplay of approximate planning strategies.近似规划策略的相互作用。

Proc Natl Acad Sci U S A. 2015 Mar 10;112(10):3098-103. doi: 10.1073/pnas.1414219112. Epub 2015 Feb 9.

Optimal behavioral hierarchy.最佳行为层次结构。

PLoS Comput Biol. 2014 Aug 14;10(8):e1003779. doi: 10.1371/journal.pcbi.1003779. eCollection 2014 Aug.

Transcranial direct current stimulation of right dorsolateral prefrontal cortex does not affect model-based or model-free reinforcement learning in humans.经颅直流电刺激右侧背外侧前额叶皮层不会影响人类基于模型或无模型的强化学习。

PLoS One. 2014 Jan 24;9(1):e86850. doi: 10.1371/journal.pone.0086850. eCollection 2014.

Cortical and hippocampal correlates of deliberation during model-based decisions for rewards in humans.人类基于模型的奖励决策中深思熟虑的皮质和海马相关物。

PLoS Comput Biol. 2013;9(12):e1003387. doi: 10.1371/journal.pcbi.1003387. Epub 2013 Dec 5.

Working-memory capacity protects model-based learning from stress.工作记忆容量能保护基于模型的学习免受压力影响。

Proc Natl Acad Sci U S A. 2013 Dec 24;110(52):20941-6. doi: 10.1073/pnas.1312011110. Epub 2013 Dec 9.

Disruption of dorsolateral prefrontal cortex decreases model-based in favor of model-free control in humans.背外侧前额叶皮层的破坏导致人类从基于模型的控制转向基于模型的控制。

Neuron. 2013 Nov 20;80(4):914-9. doi: 10.1016/j.neuron.2013.08.009. Epub 2013 Oct 24.

Goals and habits in the brain.大脑中的目标和习惯。

Neuron. 2013 Oct 16;80(2):312-25. doi: 10.1016/j.neuron.2013.09.007.

Simultaneous modeling of visual saliency and value computation improves predictions of economic choice.同时建模视觉显著性和价值计算可提高经济选择的预测。

Proc Natl Acad Sci U S A. 2013 Oct 1;110(40):E3858-67. doi: 10.1073/pnas.1304429110. Epub 2013 Sep 9.

The curse of planning: dissecting multiple reinforcement-learning systems by taxing the central executive.计划的诅咒：通过征税中央执行系统来剖析多个强化学习系统。

Psychol Sci. 2013 May;24(5):751-61. doi: 10.1177/0956797612463080. Epub 2013 Apr 4.

Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia.层次学习在人类基底神经节中引起两个同时但可分离的预测误差。

J Neurosci. 2013 Mar 27;33(13):5797-805. doi: 10.1523/JNEUROSCI.5445-12.2013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验