Harlé Katia M, Zhang Shunan, Schiff Max, Mackey Scott, Paulus Martin P, Yu Angela J
Department of Psychiatry, University of California San Diego La Jolla, CA, USA.
Department of Cognitive Science, University of California San Diego La Jolla, CA, USA.
Front Psychol. 2015 Dec 18;6:1910. doi: 10.3389/fpsyg.2015.01910. eCollection 2015.
Understanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users need to balance the appeal of an immediate high vs. the long-term goal of sobriety. We use a computational model to identify learning and decision-making abnormalities in methamphetamine-dependent individuals (MDI, n = 16) vs. healthy control subjects (HCS, n = 16), in a two-armed bandit task. In this task, subjects repeatedly choose between two arms with fixed but unknown reward rates. Each choice not only yields potential immediate reward but also information useful for long-term reward accumulation, thus pitting exploration against exploitation. We formalize the task as comprising a learning component, the updating of estimated reward rates based on ongoing observations, and a decision-making component, the choice among options based on current beliefs and uncertainties about reward rates. We model the learning component as iterative Bayesian inference (the Dynamic Belief Model), and the decision component using five competing decision policies: Win-stay/Lose-shift (WSLS), ε-Greedy, τ-Switch, Softmax, Knowledge Gradient. HCS and MDI significantly differ in how they learn about reward rates and use them to make decisions. HCS learn from past observations but weigh recent data more, and their decision policy is best fit as Softmax. MDI are more likely to follow the simple learning-independent policy of WSLS, and among MDI best fit by Softmax, they have more pessimistic prior beliefs about reward rates and are less likely to choose the option estimated to be most rewarding. Neurally, MDI's tendency to avoid the most rewarding option is associated with a lower gray matter volume of the thalamic dorsal lateral nucleus. More broadly, our work illustrates the ability of our computational framework to help reveal subtle learning and decision-making abnormalities in substance use.
了解人类如何权衡长期和短期目标,对基础认知科学和临床神经科学都很重要,因为药物使用者需要在即时快感的吸引力与清醒的长期目标之间取得平衡。我们使用一个计算模型,在双臂赌博任务中,识别甲基苯丙胺依赖个体(MDI,n = 16)与健康对照受试者(HCS,n = 16)的学习和决策异常。在这个任务中,受试者在两个奖励率固定但未知的臂之间反复进行选择。每次选择不仅会产生潜在的即时奖励,还会产生对长期奖励积累有用的信息,从而使探索与利用相互竞争。我们将该任务形式化为包括一个学习组件,即根据持续观察更新估计奖励率,以及一个决策组件,即根据当前对奖励率的信念和不确定性在选项之间进行选择。我们将学习组件建模为迭代贝叶斯推理(动态信念模型),并使用五种相互竞争的决策策略对决策组件进行建模:赢留/输换(WSLS)、ε-贪婪、τ-切换、Softmax、知识梯度。HCS和MDI在学习奖励率以及如何利用它们进行决策方面存在显著差异。HCS从过去的观察中学习,但更看重近期数据,他们的决策策略最适合Softmax。MDI更倾向于遵循简单的、与学习无关的WSLS策略,在最适合Softmax的MDI中,他们对奖励率有更悲观的先验信念,并且不太可能选择估计最有奖励的选项。在神经层面上,MDI避免最有奖励选项的倾向与丘脑背外侧核灰质体积较小有关。更广泛地说,我们的工作说明了我们的计算框架有助于揭示药物使用中微妙的学习和决策异常的能力。