Cardinal Rudolf N, Cheung Timothy H C
Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK.
BMC Neurosci. 2005 Feb 3;6:9. doi: 10.1186/1471-2202-6-9.
Delays between actions and their outcomes severely hinder reinforcement learning systems, but little is known of the neural mechanism by which animals overcome this problem and bridge such delays. The nucleus accumbens core (AcbC), part of the ventral striatum, is required for normal preference for a large, delayed reward over a small, immediate reward (self-controlled choice) in rats, but the reason for this is unclear. We investigated the role of the AcbC in learning a free-operant instrumental response using delayed reinforcement, performance of a previously-learned response for delayed reinforcement, and assessment of the relative magnitudes of two different rewards.
Groups of rats with excitotoxic or sham lesions of the AcbC acquired an instrumental response with different delays (0, 10, or 20 s) between the lever-press response and reinforcer delivery. A second (inactive) lever was also present, but responding on it was never reinforced. As expected, the delays retarded learning in normal rats. AcbC lesions did not hinder learning in the absence of delays, but AcbC-lesioned rats were impaired in learning when there was a delay, relative to sham-operated controls. All groups eventually acquired the response and discriminated the active lever from the inactive lever to some degree. Rats were subsequently trained to discriminate reinforcers of different magnitudes. AcbC-lesioned rats were more sensitive to differences in reinforcer magnitude than sham-operated controls, suggesting that the deficit in self-controlled choice previously observed in such rats was a consequence of reduced preference for delayed rewards relative to immediate rewards, not of reduced preference for large rewards relative to small rewards. AcbC lesions also impaired the performance of a previously-learned instrumental response in a delay-dependent fashion.
These results demonstrate that the AcbC contributes to instrumental learning and performance by bridging delays between subjects' actions and the ensuing outcomes that reinforce behaviour.
行动与其结果之间的延迟严重阻碍了强化学习系统,但对于动物克服这一问题并跨越此类延迟的神经机制,我们知之甚少。伏隔核核心区(AcbC)是腹侧纹状体的一部分,对于大鼠在面对大的延迟奖励和小的即时奖励时正常偏好大的延迟奖励(自我控制选择)是必需的,但其中原因尚不清楚。我们研究了AcbC在通过延迟强化学习自由操作工具性反应、对先前学习的延迟强化反应的表现以及对两种不同奖励的相对大小评估中的作用。
接受AcbC兴奋性毒性损伤或假损伤的大鼠组,在按压杠杆反应和给予强化物之间获得了不同延迟(0、10或20秒)的工具性反应。还存在第二个(不活动的)杠杆,但对其做出反应从未得到强化。正如预期的那样,延迟阻碍了正常大鼠的学习。在没有延迟的情况下,AcbC损伤并不妨碍学习,但与假手术对照组相比,有延迟时AcbC损伤的大鼠在学习上受损。所有组最终都学会了该反应,并在一定程度上区分了活动杠杆和不活动杠杆。随后训练大鼠区分不同大小的强化物。与假手术对照组相比,AcbC损伤的大鼠对强化物大小差异更敏感,这表明先前在此类大鼠中观察到的自我控制选择缺陷是相对于即时奖励而言对延迟奖励的偏好降低的结果,而不是相对于小奖励而言对大奖励的偏好降低的结果。AcbC损伤还以延迟依赖的方式损害了先前学习的工具性反应的表现。
这些结果表明,AcbC通过在主体行动与随后强化行为的结果之间架起延迟桥梁,有助于工具性学习和表现。