Redish A David, Jensen Steve, Johnson Adam, Kurth-Nelson Zeb
Department of Neuroscience, University of Minnesota.
Graduate Program in Computer Science, University of Minnesota.
Psychol Rev. 2007 Jul;114(3):784-805. doi: 10.1037/0033-295X.114.3.784.
Because learned associations are quickly renewed following extinction, the extinction process must include processes other than unlearning. However, reinforcement learning models, such as the temporal difference reinforcement learning (TDRL) model, treat extinction as an unlearning of associated value and are thus unable to capture renewal. TDRL models are based on the hypothesis that dopamine carries a reward prediction error signal; these models predict reward by driving that reward error to zero. The authors construct a TDRL model that can accommodate extinction and renewal through two simple processes: (a) a TDRL process that learns the value of situation-action pairs and (b) a situation recognition process that categorizes the observed cues into situations. This model has implications for dysfunctional states, including relapse after addiction and problem gambling.
由于习得的关联在消退后会迅速恢复,因此消退过程必定包含除消除学习之外的其他过程。然而,强化学习模型,如时间差分强化学习(TDRL)模型,将消退视为关联值的消除学习,因而无法捕捉到恢复现象。TDRL模型基于多巴胺携带奖励预测误差信号这一假设;这些模型通过将奖励误差驱动至零来预测奖励。作者构建了一个TDRL模型,该模型可以通过两个简单过程来适应消退和恢复:(a)一个学习情境 - 动作对价值的TDRL过程,以及(b)一个将观察到的线索分类为情境的情境识别过程。该模型对功能失调状态具有启示意义,包括成瘾和问题赌博后的复发。