强盗任务、信息采样和觅食任务中的选择理论。

Theory of choice in bandit, information sampling and foraging tasks.

作者信息

Averbeck Bruno B

机构信息

Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America.

出版信息

PLoS Comput Biol. 2015 Mar 27;11(3):e1004164. doi: 10.1371/journal.pcbi.1004164. eCollection 2015 Mar.

DOI:10.1371/journal.pcbi.1004164

PMID:25815510

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4376795/

Abstract

Decision making has been studied with a wide array of tasks. Here we examine the theoretical structure of bandit, information sampling and foraging tasks. These tasks move beyond tasks where the choice in the current trial does not affect future expected rewards. We have modeled these tasks using Markov decision processes (MDPs). MDPs provide a general framework for modeling tasks in which decisions affect the information on which future choices will be made. Under the assumption that agents are maximizing expected rewards, MDPs provide normative solutions. We find that all three classes of tasks pose choices among actions which trade-off immediate and future expected rewards. The tasks drive these trade-offs in unique ways, however. For bandit and information sampling tasks, increasing uncertainty or the time horizon shifts value to actions that pay-off in the future. Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately. For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.

摘要

决策制定已经通过一系列广泛的任务进行了研究。在这里，我们研究了强盗任务、信息采样任务和觅食任务的理论结构。这些任务超越了当前试验中的选择不影响未来预期奖励的任务。我们使用马尔可夫决策过程（MDP）对这些任务进行了建模。MDP为建模任务提供了一个通用框架，在这些任务中，决策会影响未来决策所依据的信息。在智能体最大化预期奖励的假设下，MDP提供了规范性解决方案。我们发现，所有这三类任务都在行动之间进行选择，这些行动在即时和未来预期奖励之间进行权衡。然而，这些任务以独特的方式推动这些权衡。对于强盗任务和信息采样任务，不确定性增加或时间跨度延长会使价值转向未来有回报的行动。相应地，不确定性降低会增加即时有回报行动的相对价值。对于觅食任务，时间跨度起着主导作用，因为在这些任务中选择不会影响未来的不确定性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d7fd/4376795/5260124359b6/pcbi.1004164.g001.jpg

相似文献

Theory of choice in bandit, information sampling and foraging tasks.强盗任务、信息采样和觅食任务中的选择理论。

PLoS Comput Biol. 2015 Mar 27;11(3):e1004164. doi: 10.1371/journal.pcbi.1004164. eCollection 2015 Mar.

Environmental uncertainty and the advantage of impulsive choice strategies.环境不确定性与冲动选择策略的优势。

PLoS Comput Biol. 2023 Jan 30;19(1):e1010873. doi: 10.1371/journal.pcbi.1010873. eCollection 2023 Jan.

The role of uncertainty in attentional and choice exploration.不确定性在注意力和选择探索中的作用。

Psychon Bull Rev. 2019 Dec;26(6):1911-1916. doi: 10.3758/s13423-019-01653-2.

Touchscreen Response Precision Is Sensitive to the Explore/Exploit Trade-off.触摸屏响应精度对探索/利用权衡敏感。

eNeuro. 2025 May 8;12(5). doi: 10.1523/ENEURO.0538-24.2025. Print 2025 May.

No Effect of Commercial Cognitive Training on Brain Activity, Choice Behavior, or Cognitive Performance.商业认知训练对大脑活动、选择行为或认知表现无影响。

J Neurosci. 2017 Aug 2;37(31):7390-7402. doi: 10.1523/JNEUROSCI.2832-16.2017. Epub 2017 Jul 10.

Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data.觅食决策作为多臂赌博机问题：将强化学习算法应用于觅食数据。

J Theor Biol. 2019 Apr 21;467:48-56. doi: 10.1016/j.jtbi.2019.02.002. Epub 2019 Feb 6.

Subjective costs drive overly patient foraging strategies in rats on an intertemporal foraging task.主观成本驱使大鼠在跨期觅食任务中采取过度的觅食策略。

Proc Natl Acad Sci U S A. 2013 May 14;110(20):8308-13. doi: 10.1073/pnas.1220738110. Epub 2013 Apr 29.

Uncertainty and exploration in a restless bandit problem.动态强盗问题中的不确定性与探索

Top Cogn Sci. 2015 Apr;7(2):351-67. doi: 10.1111/tops.12145. Epub 2015 Apr 20.

A review on exploration-exploitation trade-off in psychiatric disorders.关于精神疾病中探索-利用权衡的综述。

BMC Psychiatry. 2025 Apr 26;25(1):420. doi: 10.1186/s12888-025-06837-w.

Elife. 2019 Sep 18;8:e48429. doi: 10.7554/eLife.48429.

引用本文的文献

Rats pursue food and leisure following the same rational principles.大鼠遵循相同的理性原则追求食物和休闲。

bioRxiv. 2025 Jul 31:2024.12.08.627420. doi: 10.1101/2024.12.08.627420.

Success-efficient/failure-safe strategy for hierarchical reinforcement motor learning.分层强化运动学习的成功高效/失败安全策略。

PLoS Comput Biol. 2025 May 9;21(5):e1013089. doi: 10.1371/journal.pcbi.1013089. eCollection 2025 May.

A review on exploration-exploitation trade-off in psychiatric disorders.关于精神疾病中探索-利用权衡的综述。

BMC Psychiatry. 2025 Apr 26;25(1):420. doi: 10.1186/s12888-025-06837-w.

A subcortical switchboard for perseverative, exploratory and disengaged states.一个用于持续性、探索性和脱离状态的皮质下交换台。

Nature. 2025 May;641(8061):151-161. doi: 10.1038/s41586-025-08672-1. Epub 2025 Mar 5.

Biased expectations about future choice options predict sequential economic decisions.对未来选择选项的有偏差预期预示着一系列经济决策。

Commun Psychol. 2024 Dec 18;2(1):119. doi: 10.1038/s44271-024-00172-8.

A causal role of the right dorsolateral prefrontal cortex in random exploration.右侧背外侧前额叶皮层在随机探索中的因果作用。

Sci Rep. 2024 Oct 22;14(1):24796. doi: 10.1038/s41598-024-76025-5.

Complex behavior from intrinsic motivation to occupy future action-state path space.源自内在动机的复杂行为，以占据未来行动状态路径空间。

Nat Commun. 2024 Jul 29;15(1):6368. doi: 10.1038/s41467-024-49711-1.

Electrophysiological Markers of Aberrant Cue-Specific Exploration in Hazardous Drinkers.危险饮酒者异常线索特异性探索的电生理标志物

Comput Psychiatr. 2023 Jul 28;7(1):47-59. doi: 10.5334/cpsy.96. eCollection 2023.

Variability and harshness shape flexible strategy-use in support of the constrained flexibility framework.变异性和苛刻性塑造了灵活的策略使用，以支持约束灵活性框架。

Sci Rep. 2024 Mar 27;14(1):7236. doi: 10.1038/s41598-024-57800-w.

Motor System-Dependent Effects of Amygdala and Ventral Striatum Lesions on Explore-Exploit Behaviors.杏仁核和腹侧纹状体损伤对探索-利用行为的运动系统依赖性影响。

J Neurosci. 2024 Jan 31;44(5):e1206232023. doi: 10.1523/JNEUROSCI.1206-23.2023.

本文引用的文献

Humans use directed and random exploration to solve the explore-exploit dilemma.人类利用有向探索和随机探索来解决探索与利用的两难困境。

J Exp Psychol Gen. 2014 Dec;143(6):2074-81. doi: 10.1037/a0038199. Epub 2014 Oct 27.

Dopamine modulates novelty seeking behavior during decision making.多巴胺在决策过程中调节寻求新奇行为。

Behav Neurosci. 2014 Oct;128(5):556-66. doi: 10.1037/a0037128. Epub 2014 Jun 9.

Orbitofrontal cortex as a cognitive map of task space.眶额皮质作为任务空间的认知图。

Neuron. 2014 Jan 22;81(2):267-279. doi: 10.1016/j.neuron.2013.11.005.

Frontal-parietal and limbic-striatal activity underlies information sampling in the best choice problem.额顶叶和边缘纹状体活动是最佳选择问题中信息采样的基础。

Cereb Cortex. 2015 Apr;25(4):972-82. doi: 10.1093/cercor/bht286. Epub 2013 Oct 18.

Uncertainty about mapping future actions into rewards may underlie performance on multiple measures of impulsivity in behavioral addiction: evidence from Parkinson's disease.将未来行为转化为奖励的不确定性可能是行为成瘾中多种冲动性测量指标表现的基础：来自帕金森病的证据。

Behav Neurosci. 2013 Apr;127(2):245-55. doi: 10.1037/a0032079.

Increased reflection impulsivity in patients with ephedrone-induced Parkinsonism.具有苯丙胺类兴奋剂诱导的帕金森病患者的反射冲动性增加。

Addiction. 2013 Apr;108(4):771-9. doi: 10.1111/add.12080. Epub 2013 Feb 11.

Do not Bet on the Unknown Versus Try to Find Out More: Estimation Uncertainty and "Unexpected Uncertainty" Both Modulate Exploration.不要盲目猜测，要努力寻找更多信息：估计不确定性和“意外不确定性”都会调节探索。

Front Neurosci. 2012 Oct 16;6:150. doi: 10.3389/fnins.2012.00150. eCollection 2012.

Decision making, impulsivity, and addictions: do Parkinson's disease patients jump to conclusions?决策、冲动与成瘾：帕金森病患者会仓促下结论吗？

Mov Disord. 2012 Aug;27(9):1137-45. doi: 10.1002/mds.25105. Epub 2012 Jul 20.

Action selection and action value in frontal-striatal circuits.前额叶-纹状体回路中的动作选择和动作价值。

Neuron. 2012 Jun 7;74(5):947-60. doi: 10.1016/j.neuron.2012.03.037.

Neural mechanisms of foraging.觅食的神经机制。

Science. 2012 Apr 6;336(6077):95-8. doi: 10.1126/science.1216930.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

强盗任务、信息采样和觅食任务中的选择理论。

Theory of choice in bandit, information sampling and foraging tasks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献