基于强化学习理论的奖励选择模型

[The model of the reward choice basing on the theory of reinforcement learning].

作者信息

Smirnitskaia I A, Frolov A A, Merzhanova G Kh

出版信息

Zh Vyssh Nerv Deiat Im I P Pavlova. 2007 Mar-Apr;57(2):133-43.

Abstract

We developed the model of alimentary instrumental conditioned bar-pressing reflex for cats making a choice between either immediate small reinforcement ("impulsive behavior") or delayed more valuable reinforcement ("self-control behavior"). Our model is based on the reinforcement learning theory. We emulated dopamine contribution by discount coefficient of this theory (a subjective decrease in the value of a delayed reinforcement). The results of computer simulation showed that "cats" with large discount coefficient demonstrated "self-control behavior"; small discount coefficient was associated with "impulsive behavior". This data are in agreement with the experimental data indicating that the impulsive behavior is due to a decreased amount of dopamine in striatum.

摘要

我们为猫建立了一种食物工具性条件性压杆反射模型，用于在即时小奖励（“冲动行为”）或延迟的更有价值奖励（“自我控制行为”）之间做出选择。我们的模型基于强化学习理论。我们通过该理论的折扣系数（延迟奖励价值的主观降低）来模拟多巴胺的作用。计算机模拟结果表明，具有大折扣系数的“猫”表现出“自我控制行为”；小折扣系数与“冲动行为”相关。这些数据与实验数据一致，表明冲动行为是由于纹状体中多巴胺量的减少。