Fischer Benjamin, Wegener Detlef
Brain Research Institute, Center for Cognitive Sciences, University of Bremen , Bremen , Germany.
J Neurophysiol. 2018 Jul 1;120(1):115-128. doi: 10.1152/jn.00572.2017. Epub 2018 Apr 4.
Nonhuman primates constitute an indispensable model system for studying higher brain functions at the neurophysiological level. Studies involving these animals elucidated the neuronal mechanisms of various cognitive and executive functions, such as visual attention, working memory, and decision-making. Positive reinforcement training (PRT) constitutes the gold standard for training animals on the cognitive tasks employed in these studies. In the laboratory, PRT is usually based on application of a liquid reward as the reinforcer to strengthen the desired behavior and absence of the reward if the animal's response is wrong. By trial and error, the monkey may adapt its behavior and successfully reduce the number of error trials, and eventually learn even very sophisticated tasks. However, progress and success of the training strongly depend on reasonable error rates. If errors get too frequent, they may cause a drop in the animal's motivation to cooperate or its adaptation to high error rates and poor overall performance. We introduce in this report an alternative training regime to minimize errors and base the critical information for learning on graded rewarding. For every new task rule, the feedback to the animal is provided by different amounts of reward to distinguish the desired, optimal behavior from less optimal behavior. We applied this regime in different situations during training of visual attention tasks and analyzed behavioral performance and reaction times to evaluate its effectiveness. For both simple and complex behaviors, graded rewarding was found to constitute a powerful technique allowing for effective training without trade-off in accessible task difficulty or task performance. NEW & NOTEWORTHY Laboratory training of monkeys usually builds on providing a fixed amount of reward for the desired behavior, and no reward otherwise. We present a nonbinary, graded reward schedule to emphasize the positive, desired behavior and to keep errors on a moderate level. Using data from typical training situations, we demonstrate that graded rewards help to effectively guide the animal by success rather than errors and provide a powerful new tool for positive reinforcement training.
非人灵长类动物是在神经生理学水平上研究高等脑功能不可或缺的模型系统。涉及这些动物的研究阐明了各种认知和执行功能的神经元机制,如视觉注意力、工作记忆和决策。正强化训练(PRT)是在这些研究中用于训练动物完成认知任务的黄金标准。在实验室中,PRT通常基于使用液体奖励作为强化物来强化期望的行为,如果动物的反应错误则不给予奖励。通过反复试验,猴子可能会调整其行为并成功减少错误试验的次数,最终学会甚至非常复杂的任务。然而,训练的进展和成功很大程度上取决于合理的错误率。如果错误过于频繁,可能会导致动物合作动机下降,或者使其适应高错误率并导致整体表现不佳。在本报告中,我们引入了一种替代训练方案,以尽量减少错误,并将学习的关键信息基于分级奖励。对于每一个新的任务规则,通过给予不同数量的奖励来向动物提供反馈,以区分期望的、最佳的行为与次优行为。我们在视觉注意力任务训练的不同情况下应用了这种方案,并分析了行为表现和反应时间以评估其有效性。对于简单和复杂行为,发现分级奖励是一种强大的技术,能够在不影响可及任务难度或任务表现的情况下进行有效训练。新内容及值得注意之处猴子的实验室训练通常基于对期望行为给予固定数量的奖励,否则不给予奖励。我们提出了一种非二元的分级奖励计划,以强调积极的、期望的行为,并将错误保持在适度水平。利用典型训练情况的数据,我们证明分级奖励有助于通过成功而非错误有效地引导动物,并为正强化训练提供了一种强大的新工具。