Department of Psychology and Center for Neural Science, New York University, New York, New York 10003, USA.
J Neurosci. 2012 Sep 12;32(37):12702-11. doi: 10.1523/JNEUROSCI.6160-11.2012.
Humans take into account their own movement variability as well as potential consequences of different movement outcomes in planning movement trajectories. When variability increases, planned movements are altered so as to optimize expected consequences of the movement. Past research has focused on the steady-state responses to changing conditions of movement under risk. Here, we study the dynamics of such strategy adjustment in a visuomotor decision task in which subjects reach toward a display with regions that lead to rewards and penalties, under conditions of changing uncertainty. In typical reinforcement learning tasks, subjects should base subsequent strategy by computing an estimate of the mean outcome (e.g., reward) in recent trials. In contrast, in our task, strategy should be based on a dynamic estimate of recent outcome uncertainty (i.e., squared error). We find that subjects respond to increased movement uncertainty by aiming movements more conservatively with respect to penalty regions, and that the estimate of uncertainty they use is well characterized by a weighted average of recent squared errors, with higher weights given to more recent trials.
人类在规划运动轨迹时,会考虑到自身的运动可变性以及不同运动结果的潜在后果。当可变性增加时,计划中的运动就会改变,以优化运动的预期结果。过去的研究主要集中在对风险条件下运动变化的稳态响应上。在这里,我们在一个视动决策任务中研究了这种策略调整的动态,在这个任务中,被试朝着一个显示区域伸出手,这些区域会带来奖励和惩罚,在不断变化的不确定条件下。在典型的强化学习任务中,被试应该通过计算最近试验中平均结果(例如奖励)的估计值来确定后续策略。相比之下,在我们的任务中,策略应该基于最近结果不确定性的动态估计值(即平方误差)。我们发现,被试通过更保守地瞄准惩罚区域来应对运动不确定性的增加,他们使用的不确定性估计值可以很好地用最近的平方误差的加权平均值来描述,其中最近的试验赋予更高的权重。