奖赏介导的、无模型的强化学习机制在条件反射和工具性任务中是相关的。

Reward-Mediated, Model-Free Reinforcement-Learning Mechanisms in Pavlovian and Instrumental Tasks Are Related.

机构信息

Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511.

Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, United Kingdom.

出版信息

J Neurosci. 2023 Jan 18;43(3):458-471. doi: 10.1523/JNEUROSCI.1113-22.2022. Epub 2022 Oct 10.

DOI:10.1523/JNEUROSCI.1113-22.2022

PMID:36216504

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9864557/

Abstract

Model-free and model-based computations are argued to distinctly update action values that guide decision-making processes. It is not known, however, if these model-free and model-based reinforcement learning mechanisms recruited in operationally based instrumental tasks parallel those engaged by pavlovian-based behavioral procedures. Recently, computational work has suggested that individual differences in the attribution of incentive salience to reward predictive cues, that is, sign- and goal-tracking behaviors, are also governed by variations in model-free and model-based value representations that guide behavior. Moreover, it is not appreciated if these systems that are characterized computationally using model-free and model-based algorithms are conserved across tasks for individual animals. In the current study, we used a within-subject design to assess sign-tracking and goal-tracking behaviors using a pavlovian conditioned approach task and then characterized behavior using an instrumental multistage decision-making (MSDM) task in male rats. We hypothesized that both pavlovian and instrumental learning processes may be driven by common reinforcement-learning mechanisms. Our data confirm that sign-tracking behavior was associated with greater reward-mediated, model-free reinforcement learning and that it was also linked to model-free reinforcement learning in the MSDM task. Computational analyses revealed that pavlovian model-free updating was correlated with model-free reinforcement learning in the MSDM task. These data provide key insights into the computational mechanisms mediating associative learning that could have important implications for normal and abnormal states. Model-free and model-based computations that guide instrumental decision-making processes may also be recruited in pavlovian-based behavioral procedures. Here, we used a within-subject design to test the hypothesis that both pavlovian and instrumental learning processes were driven by common reinforcement-learning mechanisms. Sign-tracking and goal-tracking behaviors were assessed in rats using a pavlovian conditioned approach task, and then instrumental behavior was characterized using an MSDM task. We report that sign-tracking behavior was associated with greater model-free, but not model-based, learning in the MSDM task. These data suggest that pavlovian and instrumental behaviors may be driven by conserved reinforcement-learning mechanisms.

摘要

无模型和基于模型的计算被认为可以分别更新指导决策过程的动作值。然而，尚不清楚在基于操作的工具任务中招募的这些无模型和基于模型的强化学习机制是否与基于巴甫洛夫的行为程序中招募的机制平行。最近，计算工作表明，对奖励预测线索（即符号和目标跟踪行为）赋予激励显著性的个体差异也受指导行为的无模型和基于模型的价值表示的变化所支配。此外，尚不清楚用于个体动物的计算的这些系统是否在任务之间保持一致。在当前的研究中，我们使用了一种被试内设计，使用巴甫洛夫条件接近任务来评估符号跟踪和目标跟踪行为，然后使用多阶段决策（MSDM）任务来描述行为。我们假设，无论是巴甫洛夫式的学习还是工具式的学习过程，都可能是由共同的强化学习机制驱动的。我们的数据证实，符号跟踪行为与更大的奖励介导的、无模型的强化学习有关，并且它也与 MSDM 任务中的无模型强化学习有关。计算分析表明，巴甫洛夫式的无模型更新与 MSDM 任务中的无模型强化学习相关。这些数据为介导联想学习的计算机制提供了重要的见解，这可能对正常和异常状态具有重要意义。指导工具决策过程的无模型和基于模型的计算也可能被用于基于巴甫洛夫的行为程序。在这里，我们使用被试内设计来检验以下假设：即巴甫洛夫式学习和工具式学习过程都由共同的强化学习机制驱动。我们使用巴甫洛夫条件接近任务来评估大鼠的符号跟踪和目标跟踪行为，然后使用 MSDM 任务来描述工具行为。我们报告说，符号跟踪行为与 MSDM 任务中的更大的无模型但不是基于模型的学习有关。这些数据表明，巴甫洛夫式和工具式行为可能由保守的强化学习机制驱动。

相似文献

Reward-Mediated, Model-Free Reinforcement-Learning Mechanisms in Pavlovian and Instrumental Tasks Are Related.奖赏介导的、无模型的强化学习机制在条件反射和工具性任务中是相关的。

J Neurosci. 2023 Jan 18;43(3):458-471. doi: 10.1523/JNEUROSCI.1113-22.2022. Epub 2022 Oct 10.

Effects of predictive and incentive value manipulation on sign- and goal-tracking behavior.预测价值和激励价值操纵对符号和目标跟踪行为的影响。

Neurobiol Learn Mem. 2023 Sep;203:107796. doi: 10.1016/j.nlm.2023.107796. Epub 2023 Jun 28.

The lateral hypothalamus and orexinergic transmission in the paraventricular thalamus promote the attribution of incentive salience to reward-associated cues.外侧下丘脑和室旁丘脑的食欲素能传递促进了将激励价值归因于与奖励相关的线索。

Psychopharmacology (Berl). 2020 Dec;237(12):3741-3758. doi: 10.1007/s00213-020-05651-4. Epub 2020 Aug 27.

Environmental manipulations alter age differences in attribution of incentive salience to reward-paired cues.环境操控改变了归因于与奖励相关联的线索的激励价值的年龄差异。

Behav Brain Res. 2013 Nov 15;257:83-9. doi: 10.1016/j.bbr.2013.09.021. Epub 2013 Sep 16.

Neurochemical and Behavioral Dissections of Decision-Making in a Rodent Multistage Task.在啮齿动物多阶段任务中对决策的神经化学和行为剖析。

J Neurosci. 2019 Jan 9;39(2):295-306. doi: 10.1523/JNEUROSCI.2219-18.2018. Epub 2018 Nov 9.

Rats that sign-track are resistant to Pavlovian but not instrumental extinction.出现信号追踪行为的大鼠对巴甫洛夫式消退具有抗性，但对工具性消退则不然。

Behav Brain Res. 2016 Jan 1;296:418-430. doi: 10.1016/j.bbr.2015.07.055. Epub 2015 Jul 30.

Subanesthetic ketamine decreases the incentive-motivational value of reward-related cues.亚麻醉剂量的氯胺酮会降低与奖励相关线索的动机激励价值。

J Psychopharmacol. 2017 Jan;31(1):67-74. doi: 10.1177/0269881116667709. Epub 2016 Sep 26.

A mechanical task for measuring sign- and goal-tracking in humans: A proof-of-concept study.一种用于测量人类符号和目标追踪的机械任务：概念验证研究。

Behav Brain Res. 2023 Jan 5;436:114112. doi: 10.1016/j.bbr.2022.114112. Epub 2022 Sep 14.

Medial orbitofrontal cortical regulation of different aspects of Pavlovian and instrumental reward seeking.眶额皮质内侧对巴甫洛夫式和工具性奖励寻求的不同方面的调节。

Psychopharmacology (Berl). 2023 Mar;240(3):441-459. doi: 10.1007/s00213-022-06265-8. Epub 2022 Nov 2.

Suboptimal choice in rats: Incentive salience attribution promotes maladaptive decision-making.大鼠的次优选择：动机显著性归因促进适应不良的决策。

Behav Brain Res. 2017 Mar 1;320:244-254. doi: 10.1016/j.bbr.2016.12.013. Epub 2016 Dec 16.

引用本文的文献

Disruptions in Reward-Guided Decision-Making Functions Are Predictive of Greater Oral Oxycodone Self-Administration in Male and Female Rats.奖赏引导决策功能的破坏可预测雄性和雌性大鼠口服羟考酮自我给药量的增加。

Biol Psychiatry Glob Open Sci. 2025 Jan 21;5(3):100450. doi: 10.1016/j.bpsgos.2025.100450. eCollection 2025 May.

Implementations of sign- and goal-tracking behavior in humans: A scoping review.人类中信号和目标追踪行为的实施：一项范围综述。

Cogn Affect Behav Neurosci. 2025 Apr;25(2):263-290. doi: 10.3758/s13415-024-01230-8. Epub 2024 Nov 5.

Leveraging individual differences in cue-reward learning to investigate the psychological and neural basis of shared psychiatric symptomatology: The sign-tracker/goal-tracker model.利用线索-奖励学习中的个体差异来探究共享精神症状的心理和神经基础：信号追踪者/目标追踪者模型。

Behav Neurosci. 2024 Aug;138(4):260-271. doi: 10.1037/bne0000590. Epub 2024 May 16.

本文引用的文献

Reinforcement learning detuned in addiction: integrative and translational approaches.成瘾障碍中的强化学习失谐：综合与转化研究方法。

Trends Neurosci. 2022 Feb;45(2):96-105. doi: 10.1016/j.tins.2021.11.007. Epub 2021 Dec 15.

Sign tracking predicts suboptimal behavior in a rodent gambling task.信号追踪预示着啮齿动物赌博任务中的次优行为。

Psychopharmacology (Berl). 2021 Sep;238(9):2645-2660. doi: 10.1007/s00213-021-05887-8. Epub 2021 Jun 30.

Unlocking the reinforcement-learning circuits of the orbitofrontal cortex.解锁眶额皮层的强化学习回路。

Behav Neurosci. 2021 Apr;135(2):120-128. doi: 10.1037/bne0000414.

Sign- and goal-tracking score does not correlate with addiction-like behavior following prolonged cocaine self-administration.标记和目标追踪评分与长期可卡因自我给药后的成瘾样行为不相关。

Psychopharmacology (Berl). 2021 Aug;238(8):2335-2346. doi: 10.1007/s00213-021-05858-z. Epub 2021 May 5.

The Anterior Cingulate Cortex Predicts Future States to Mediate Model-Based Action Selection.前扣带皮层预测未来状态，以介导基于模型的动作选择。

Neuron. 2021 Jan 6;109(1):149-163.e7. doi: 10.1016/j.neuron.2020.10.013. Epub 2020 Nov 4.

Divergent Strategies for Learning in Males and Females.男性和女性学习的差异化策略。

Curr Biol. 2021 Jan 11;31(1):39-50.e4. doi: 10.1016/j.cub.2020.09.075. Epub 2020 Oct 29.

Sign tracking predicts cue-induced but not drug-primed reinstatement to methamphetamine seeking in rats: Effects of oxytocin treatment.信号跟踪预测线索诱导而不是药物引发的甲基苯丙胺觅药行为的恢复：催产素治疗的影响。

J Psychopharmacol. 2020 Nov;34(11):1271-1279. doi: 10.1177/0269881120954052. Epub 2020 Oct 20.

BEHAVIORAL AND NEUROBIOLOGICAL MECHANISMS OF PAVLOVIAN AND INSTRUMENTAL EXTINCTION LEARNING.条件反射和工具性消退学习的行为与神经生物学机制。

Physiol Rev. 2021 Apr 1;101(2):611-681. doi: 10.1152/physrev.00016.2020. Epub 2020 Sep 24.

Sign-tracking behavior is sensitive to outcome devaluation in a devaluation context-dependent manner: implications for analyzing habitual behavior.在依赖于结果贬值情境的情况下，信号追踪行为对结果贬值敏感：对分析习惯性行为的启示。

Learn Mem. 2020 Mar 16;27(4):136-149. doi: 10.1101/lm.051144.119. Print 2020 Apr.

Effects of Limited and Extended Pavlovian Training on Devaluation Sensitivity of Sign- and Goal-Tracking Rats.有限和扩展的巴甫洛夫训练对信号追踪和目标追踪大鼠贬值敏感性的影响。

Front Behav Neurosci. 2020 Feb 4;14:3. doi: 10.3389/fnbeh.2020.00003. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验