协调强化学习模型与行为消退及恢复：对成瘾、复发和问题赌博的启示。

Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.

作者信息

Redish A David, Jensen Steve, Johnson Adam, Kurth-Nelson Zeb

机构信息

Department of Neuroscience, University of Minnesota.

Graduate Program in Computer Science, University of Minnesota.

出版信息

Psychol Rev. 2007 Jul;114(3):784-805. doi: 10.1037/0033-295X.114.3.784.

DOI:10.1037/0033-295X.114.3.784

PMID:17638506

Abstract

Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL models are based on the hypothesis that dopamine carries a reward prediction error signal; these models predict reward by driving that reward error to zero. The authors construct a TDRL model that can accommodate extinction and renewal through two simple processes: (a) a TDRL process that learns the value of situation-action pairs and (b) a situation recognition process that categorizes the observed cues into situations. This model has implications for dysfunctional states, including relapse after addiction and problem gambling.

摘要

由于习得的关联在消退后会迅速恢复，因此消退过程必定包含除消除学习之外的其他过程。然而，强化学习模型，如时间差分强化学习（TDRL）模型，将消退视为关联值的消除学习，因而无法捕捉到恢复现象。TDRL模型基于多巴胺携带奖励预测误差信号这一假设；这些模型通过将奖励误差驱动至零来预测奖励。作者构建了一个TDRL模型，该模型可以通过两个简单过程来适应消退和恢复：（a）一个学习情境 - 动作对价值的TDRL过程，以及（b）一个将观察到的线索分类为情境的情境识别过程。该模型对功能失调状态具有启示意义，包括成瘾和问题赌博后的复发。

相似文献

Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.

Psychol Rev. 2007 Jul;114(3):784-805. doi: 10.1037/0033-295X.114.3.784.

Addiction as a computational process gone awry.

Science. 2004 Dec 10;306(5703):1944-7. doi: 10.1126/science.1102384.

Skewed by Cues? The Motivational Role of Audiovisual Stimuli in Modelling Substance Use and Gambling Disorders.

Curr Top Behav Neurosci. 2016;27:507-29. doi: 10.1007/7854_2015_393.

A model of resurgence based on behavioral momentum theory.

J Exp Anal Behav. 2011 Jan;95(1):91-108. doi: 10.1901/jeab.2011.95-91.

Extinction with multiple excitors.

Learn Behav. 2013 Jun;41(2):119-37. doi: 10.3758/s13420-012-0090-6.

Contingency learning in alcohol dependence and pathological gambling: learning and unlearning reward contingencies.

Alcohol Clin Exp Res. 2014 Jun;38(6):1602-10. doi: 10.1111/acer.12393. Epub 2014 May 12.

Risk-prone individuals prefer the wrong options on a rat version of the Iowa Gambling Task.

Biol Psychiatry. 2009 Oct 15;66(8):743-9. doi: 10.1016/j.biopsych.2009.04.008. Epub 2009 May 31.

High-frequency gamblers show increased resistance to extinction following partial reinforcement.

Behav Brain Res. 2012 Apr 15;229(2):438-42. doi: 10.1016/j.bbr.2012.01.024. Epub 2012 Jan 20.

Hippocampal activation during extinction learning predicts occurrence of the renewal effect in extinction recall.

Neuroimage. 2013 Nov 1;81:131-143. doi: 10.1016/j.neuroimage.2013.05.025. Epub 2013 May 16.

How we learn to make decisions: rapid propagation of reinforcement learning prediction errors in humans.

J Cogn Neurosci. 2014 Mar;26(3):635-44. doi: 10.1162/jocn_a_00509. Epub 2013 Oct 29.

引用本文的文献

Error-driven changes in hippocampal representations accompany flexible re-learning.

bioRxiv. 2025 May 21:2025.05.20.655046. doi: 10.1101/2025.05.20.655046.

Coping with failures: how emotions, individual traits, expectation-importance and prior experience affect reactions to violated achievement expectations.

Front Psychol. 2025 Mar 6;16:1506051. doi: 10.3389/fpsyg.2025.1506051. eCollection 2025.

Global remapping emerges as the mechanism for renewal of context-dependent behavior in a reinforcement learning model.

Front Comput Neurosci. 2025 Jan 15;18:1462110. doi: 10.3389/fncom.2024.1462110. eCollection 2024.

Natural forgetting reversibly modulates engram expression.

Elife. 2024 Nov 5;12:RP92860. doi: 10.7554/eLife.92860.

Dopamine and Norepinephrine Differentially Mediate the Exploration-Exploitation Tradeoff.

J Neurosci. 2024 Oct 30;44(44):e1194232024. doi: 10.1523/JNEUROSCI.1194-23.2024.

Aberrant neural computation of social controllability in nicotine-dependent humans.

Commun Biol. 2024 Aug 14;7(1):988. doi: 10.1038/s42003-024-06638-z.

Prediction error determines how memories are organized in the brain.

Elife. 2024 Jul 19;13:RP95849. doi: 10.7554/eLife.95849.

A Competition of Critics in Human Decision-Making.

Comput Psychiatr. 2021 Aug 12;5(1):81-101. doi: 10.5334/cpsy.64. eCollection 2021.

The utility of a latent-cause framework for understanding addiction phenomena.

Addict Neurosci. 2024 Mar;10. doi: 10.1016/j.addicn.2024.100143. Epub 2024 Jan 15.

Aberrant neural computation of social controllability in nicotine-dependent humans.

Res Sq. 2024 Jan 24:rs.3.rs-3854519. doi: 10.21203/rs.3.rs-3854519/v1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

协调强化学习模型与行为消退及恢复：对成瘾、复发和问题赌博的启示。

Reconciling reinforcement learning models with behavioral extinction and renewal: implications for addiction, relapse, and problem gambling.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献