Department of Psychological and Brain Sciences, Indiana University Bloomington, IN, USA.
Front Neurosci. 2014 May 23;8:101. doi: 10.3389/fnins.2014.00101. eCollection 2014.
WHEN ANIMALS HAVE TO MAKE A NUMBER OF DECISIONS DURING A LIMITED TIME INTERVAL, THEY FACE A FUNDAMENTAL PROBLEM: how much time they should spend on each decision in order to achieve the maximum possible total outcome. Deliberating more on one decision usually leads to more outcome but less time will remain for other decisions. In the framework of sequential sampling models, the question is how animals learn to set their decision threshold such that the total expected outcome achieved during a limited time is maximized. The aim of this paper is to provide a theoretical framework for answering this question. To this end, we consider an experimental design in which each trial can come from one of the several possible "conditions." A condition specifies the difficulty of the trial, the reward, the penalty and so on. We show that to maximize the expected reward during a limited time, the subject should set a separate value of decision threshold for each condition. We propose a model of learning the optimal value of decision thresholds based on the theory of semi-Markov decision processes (SMDP). In our model, the experimental environment is modeled as an SMDP with each "condition" being a "state" and the value of decision thresholds being the "actions" taken in those states. The problem of finding the optimal decision thresholds then is cast as the stochastic optimal control problem of taking actions in each state in the corresponding SMDP such that the average reward rate is maximized. Our model utilizes a biologically plausible learning algorithm to solve this problem. The simulation results show that at the beginning of learning the model choses high values of decision threshold which lead to sub-optimal performance. With experience, however, the model learns to lower the value of decision thresholds till finally it finds the optimal values.
当动物在有限的时间内必须做出多项决策时,它们会面临一个基本问题:为了获得最大的总结果,它们应该在每个决策上花费多少时间。在一个决策上思考得越多,通常会带来更多的结果,但留给其他决策的时间就越少。在顺序抽样模型的框架内,问题是动物如何学会设置决策阈值,以便在有限的时间内实现最大的总预期结果。本文的目的是提供一个理论框架来回答这个问题。为此,我们考虑了一种实验设计,其中每个试验可以来自几种可能的“条件”之一。条件指定了试验的难度、奖励、惩罚等。我们表明,为了在有限的时间内最大化预期奖励,主体应该为每个条件设置单独的决策阈值值。我们提出了一种基于半马尔可夫决策过程(SMDP)理论学习最优决策阈值值的模型。在我们的模型中,实验环境被建模为一个具有每个“条件”为一个“状态”和决策阈值值为在这些状态中采取的“动作”的 SMDP。然后,找到最优决策阈值值的问题被表述为在相应 SMDP 中的每个状态中采取行动的随机最优控制问题,以使平均奖励率最大化。我们的模型利用一种合理的学习算法来解决这个问题。模拟结果表明,在学习的开始阶段,模型选择高的决策阈值值,这导致次优的性能。然而,随着经验的积累,模型学会降低决策阈值值,直到最终找到最优值。