Department of Psychology, and.
Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213.
J Neurosci. 2019 Mar 20;39(12):2251-2264. doi: 10.1523/JNEUROSCI.1924-18.2019. Epub 2019 Jan 17.
Goal-directed behavior requires integrating action selection processes with learning systems that adapt control using environmental feedback. These functions are known to intersect at a common neural substrate with multiple known targets of plasticity (the cortico-basal ganglia-thalamic network), suggesting that feedback signals have a multifaceted impact on future decisions. Using a hybrid of accumulation-to-bound decision models and reinforcement learning, we modeled the performance of humans in a stop signal task where participants (N 75: 37 males, 38 females) learned the prior distribution of the timing of a stop signal through trial-and-error feedback. Changes in the drift rate of the action execution process were driven by errors in action timing, whereas adaptation in the boundary height served to increase caution following failed stops. These findings highlight two interactive learning mechanisms for adapting the control of goal-directed actions based on dissociable dimensions of feedback error. Many complex behavioral goals rely on the ability to regulate the timing of action execution while also maintaining enough control to cancel actions in response to "Stop" cues in the environment. Here we examined how these fundamental components of behavior become tuned to the control demands of the environment by combining principles of reinforcement learning with accumulation-to-bound models. Model fits to behavioral data in an adaptive stop signal task revealed two adaptive mechanisms: (1) timing error-related changes in the rate of the execution signal; and (2) an increase in the execution boundary after failed stops. These findings demonstrate unique effects of timing and control errors on the underlying mechanisms of control, the rate and threshold of accumulating action signals.
目标导向行为需要将动作选择过程与学习系统相结合,学习系统使用环境反馈来适应控制。这些功能已知在一个共同的神经基质中相交,该基质具有多个已知的可塑性靶点(皮质基底节丘脑网络),这表明反馈信号对未来的决策有多种影响。我们使用积累到边界的决策模型和强化学习的混合模型,对人类在停止信号任务中的表现进行了建模,在该任务中,参与者(N = 75:37 名男性,38 名女性)通过试错反馈学习停止信号时间的先验分布。动作执行过程的漂移率的变化是由动作定时的误差驱动的,而边界高度的适应则有助于在失败的停止后增加谨慎性。这些发现强调了两种交互式学习机制,用于根据反馈误差的可分离维度来调整目标导向动作的控制。许多复杂的行为目标依赖于调节动作执行时间的能力,同时也需要保持足够的控制能力,以便在环境中的“停止”提示下取消动作。在这里,我们通过将强化学习原则与积累到边界模型相结合,研究了这些行为的基本组成部分如何根据环境的控制要求进行调整。适应性停止信号任务中的行为数据的模型拟合揭示了两种自适应机制:(1)执行信号速率的与时间误差相关的变化;(2)失败停止后执行边界的增加。这些发现证明了时间和控制误差对控制机制、积累动作信号的速率和阈值的潜在机制有独特的影响。