Averbeck Bruno B
Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Bethesda, Maryland, United States of America.
PLoS Comput Biol. 2015 Mar 27;11(3):e1004164. doi: 10.1371/journal.pcbi.1004164. eCollection 2015 Mar.
Decision making has been studied with a wide array of tasks. Here we examine the theoretical structure of bandit, information sampling and foraging tasks. These tasks move beyond tasks where the choice in the current trial does not affect future expected rewards. We have modeled these tasks using Markov decision processes (MDPs). MDPs provide a general framework for modeling tasks in which decisions affect the information on which future choices will be made. Under the assumption that agents are maximizing expected rewards, MDPs provide normative solutions. We find that all three classes of tasks pose choices among actions which trade-off immediate and future expected rewards. The tasks drive these trade-offs in unique ways, however. For bandit and information sampling tasks, increasing uncertainty or the time horizon shifts value to actions that pay-off in the future. Correspondingly, decreasing uncertainty increases the relative value of actions that pay-off immediately. For foraging tasks the time-horizon plays the dominant role, as choices do not affect future uncertainty in these tasks.
决策制定已经通过一系列广泛的任务进行了研究。在这里,我们研究了强盗任务、信息采样任务和觅食任务的理论结构。这些任务超越了当前试验中的选择不影响未来预期奖励的任务。我们使用马尔可夫决策过程(MDP)对这些任务进行了建模。MDP为建模任务提供了一个通用框架,在这些任务中,决策会影响未来决策所依据的信息。在智能体最大化预期奖励的假设下,MDP提供了规范性解决方案。我们发现,所有这三类任务都在行动之间进行选择,这些行动在即时和未来预期奖励之间进行权衡。然而,这些任务以独特的方式推动这些权衡。对于强盗任务和信息采样任务,不确定性增加或时间跨度延长会使价值转向未来有回报的行动。相应地,不确定性降低会增加即时有回报行动的相对价值。对于觅食任务,时间跨度起着主导作用,因为在这些任务中选择不会影响未来的不确定性。