Frank Michael J, Gagne Chris, Nyhus Erika, Masters Sean, Wiecki Thomas V, Cavanagh James F, Badre David
Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, Rhode Island 02912, Brown Institute for Brain Science, Providence, Rhode Island 09212, Department of Psychiatry and Human Behavior, Brown University, Providence, Rhode Island 02912,
Department of Cognitive, Linguistic and Psychological Sciences, Brown University, Providence, Rhode Island 02912.
J Neurosci. 2015 Jan 14;35(2):485-94. doi: 10.1523/JNEUROSCI.2036-14.2015.
What are the neural dynamics of choice processes during reinforcement learning? Two largely separate literatures have examined dynamics of reinforcement learning (RL) as a function of experience but assuming a static choice process, or conversely, the dynamics of choice processes in decision making but based on static decision values. Here we show that human choice processes during RL are well described by a drift diffusion model (DDM) of decision making in which the learned trial-by-trial reward values are sequentially sampled, with a choice made when the value signal crosses a decision threshold. Moreover, simultaneous fMRI and EEG recordings revealed that this decision threshold is not fixed across trials but varies as a function of activity in the subthalamic nucleus (STN) and is further modulated by trial-by-trial measures of decision conflict and activity in the dorsomedial frontal cortex (pre-SMA BOLD and mediofrontal theta in EEG). These findings provide converging multimodal evidence for a model in which decision threshold in reward-based tasks is adjusted as a function of communication from pre-SMA to STN when choices differ subtly in reward values, allowing more time to choose the statistically more rewarding option.
强化学习过程中选择过程的神经动力学是什么?有两大相对独立的文献分别研究了强化学习(RL)的动力学,将其视为经验的函数,但假定选择过程是静态的;或者相反,研究了决策过程中选择过程的动力学,但基于静态决策值。在这里,我们表明强化学习过程中的人类选择过程可以通过一种决策的漂移扩散模型(DDM)很好地描述,在该模型中,逐次试验学习到的奖励值被顺序采样,当价值信号超过决策阈值时做出选择。此外,同时进行的功能磁共振成像(fMRI)和脑电图(EEG)记录显示,这个决策阈值在不同试验中不是固定的,而是随着丘脑底核(STN)的活动而变化,并进一步受到决策冲突的逐次试验测量以及背内侧前额叶皮层(脑电图中的前辅助运动区BOLD信号和额内侧θ波)活动的调节。这些发现为一个模型提供了多模态的一致证据,在该模型中,当选择在奖励值上有细微差异时,基于奖励任务中的决策阈值会根据从前辅助运动区到丘脑底核的通信进行调整,从而有更多时间选择统计学上更有奖励的选项。