Lau Brian, Glimcher Paul W
Center for Neural Science, New York University, New York, New York 10003, USA.
J Exp Anal Behav. 2005 Nov;84(3):555-79. doi: 10.1901/jeab.2005.110-04.
We studied the choice behavior of 2 monkeys in a discrete-trial task with reinforcement contingencies similar to those Herrnstein (1961) used when he described the matching law. In each session, the monkeys experienced blocks of discrete trials at different relative-reinforcer frequencies or magnitudes with unsignalled transitions between the blocks. Steady-state data following adjustment to each transition were well characterized by the generalized matching law; response ratios undermatched reinforcer frequency ratios but matched reinforcer magnitude ratios. We modelled response-by-response behavior with linear models that used past reinforcers as well as past choices to predict the monkeys' choices on each trial. We found that more recently obtained reinforcers more strongly influenced choice behavior. Perhaps surprisingly, we also found that the monkeys' actions were influenced by the pattern of their own past choices. It was necessary to incorporate both past reinforcers and past choices in order to accurately capture steady-state behavior as well as the fluctuations during block transitions and the response-by-response patterns of behavior. Our results suggest that simple reinforcement learning models must account for the effects of past choices to accurately characterize behavior in this task, and that models with these properties provide a conceptual tool for studying how both past reinforcers and past choices are integrated by the neural systems that generate behavior.
我们研究了2只猴子在离散试验任务中的选择行为,该任务的强化条件类似于赫恩斯坦(1961年)描述匹配定律时所使用的条件。在每个实验环节中,猴子会经历不同相对强化频率或强度的离散试验块,且试验块之间的转换没有信号提示。在适应每个转换后的稳态数据可以很好地用广义匹配定律来描述;反应比率低于强化频率比率,但与强化强度比率相匹配。我们用线性模型对逐个反应的行为进行建模,该模型使用过去的强化物以及过去的选择来预测猴子在每次试验中的选择。我们发现,最近获得的强化物对选择行为的影响更强。也许令人惊讶的是,我们还发现猴子的行为受到其自身过去选择模式的影响。为了准确捕捉稳态行为以及试验块转换期间的波动和逐个反应的行为模式,有必要同时纳入过去的强化物和过去的选择。我们的结果表明,简单的强化学习模型必须考虑过去选择的影响,才能准确描述该任务中的行为,并且具有这些特性的模型为研究生成行为的神经系统如何整合过去的强化物和过去的选择提供了一个概念工具。