Kooij Katinka van der, Smeets Jeroen B J, Mastrigt Nina M van, Wijk Bernadette C M van
Department of Human Movement Sciences, Vrije Universiteit Amsterdam, van der Boechorststraat 9, 1081BT, Amsterdam, The Netherlands.
Department of Psychology, Justus-Liebig-Universität Gießen, Gießen, Germany.
Exp Brain Res. 2025 Apr 15;243(5):117. doi: 10.1007/s00221-025-07074-z.
Humans can learn various motor tasks based on binary reward feedback on whether a movement attempt was successful or not. Such 'reward-based motor learning' relies on exploiting successful motor commands and exploring different motor commands following failure. Most computational models of reward-based motor learning have formalized exploration as a random process, in which on each trial a random draw is taken from a normal distribution centred on zero. Whether human motor exploration is indeed random from trial to trial has not been tested yet. Here we tested in a force production task whether human motor exploration is random. To this end, we compared the proportion trial-to-trial force changes in the behavioural data that have the same sign to the proportion expected in random exploration. One group of participants practiced with an adaptive reward criterion, which keeps rewarded performance close to current performance, and the other group practiced with a fixed reward criterion in which current performance can be far from reward performance. In both groups, we found a proportion same-sign changes larger than predicted. In the Adaptive group, both the learning and proportion same-sign changes were consistent with model simulations for low values of random exploration, whereas in the Fixed group both the learning and proportion same-sign changes were inconsistent with model simulations based on random exploration. This suggests that some form of non-random motor exploration contributes to reward-based motor learning.
人类可以根据动作尝试是否成功的二元奖励反馈来学习各种运动任务。这种“基于奖励的运动学习”依赖于利用成功的运动指令,并在失败后探索不同的运动指令。大多数基于奖励的运动学习计算模型将探索形式化为一个随机过程,即在每次试验中从以零为中心的正态分布中进行随机抽取。人类运动探索在每次试验中是否真的是随机的尚未得到检验。在这里,我们在一个力量产生任务中测试了人类运动探索是否是随机的。为此,我们将行为数据中逐次试验力量变化具有相同符号的比例与随机探索中预期的比例进行了比较。一组参与者采用自适应奖励标准进行练习,该标准使奖励表现接近当前表现,另一组参与者采用固定奖励标准进行练习,在该标准下当前表现可能与奖励表现相差甚远。在两组中,我们发现具有相同符号变化的比例大于预测值。在自适应组中,学习和具有相同符号变化的比例都与随机探索值较低时的模型模拟一致,而在固定组中,学习和具有相同符号变化的比例都与基于随机探索的模型模拟不一致。这表明某种形式的非随机运动探索有助于基于奖励的运动学习。