Sakai Yutaka, Fukai Tomoki
Department of Intelligent Information Systems, Tamagawa University, Machida, Tokyo 194-8610, Japan.
Neural Comput. 2008 Jan;20(1):227-51. doi: 10.1162/neco.2008.20.1.227.
The ability to make a correct choice of behavior from various options is crucial for animals' survival. The neural basis for the choice of behavior has been attracting growing attention in research on biological and artificial neural systems. Alternative choice tasks with variable ratio (VR) and variable interval (VI) schedules of reinforcement have often been employed in studying decision making by animals and humans. In the VR schedule task, alternative choices are reinforced with different probabilities, and subjects learn to select the behavioral response rewarded more frequently. In the VI schedule task, alternative choices are reinforced at different average intervals independent of the choice frequencies, and the choice behavior follows the so-called matching law. The two policies appear robustly in subjects' choice of behavior, but the underlying neural mechanisms remain unknown. Here, we show that these seemingly different policies can appear from a common computational algorithm known as actor-critic learning. We present experimentally testable variations of the VI schedule in which the matching behavior gives only a suboptimal solution to decision making and show that the actor-critic system exhibits the matching behavior in the steady state of the learning even when the matching behavior is suboptimal. However, it is found that the matching behavior can earn approximately the same reward as the optimal one in many practical situations.
从各种选项中做出正确行为选择的能力对动物的生存至关重要。行为选择的神经基础在生物和人工神经系统研究中一直备受关注。具有可变比率(VR)和可变间隔(VI)强化程序的交替选择任务经常被用于研究动物和人类的决策。在VR程序任务中,交替选择以不同概率得到强化,受试者学会选择更频繁得到奖励的行为反应。在VI程序任务中,交替选择在与选择频率无关的不同平均间隔得到强化,选择行为遵循所谓的匹配法则。这两种策略在受试者的行为选择中表现得很稳健,但其潜在的神经机制仍然未知。在这里,我们表明,这些看似不同的策略可以从一种称为行为-评判学习的通用计算算法中出现。我们提出了VI程序的可实验测试变体,其中匹配行为对决策仅给出次优解决方案,并表明行为-评判系统在学习的稳态中表现出匹配行为,即使匹配行为是次优的。然而,发现在许多实际情况下,匹配行为可以获得与最优行为大致相同的奖励。