Department of Human Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
Biol Cybern. 2021 Aug;115(4):365-382. doi: 10.1007/s00422-021-00884-8. Epub 2021 Aug 2.
When learning a movement based on binary success information, one is more variable following failure than following success. Theoretically, the additional variability post-failure might reflect exploration of possibilities to obtain success. When average behavior is changing (as in learning), variability can be estimated from differences between subsequent movements. Can one estimate exploration reliably from such trial-to-trial changes when studying reward-based motor learning? To answer this question, we tried to reconstruct the exploration underlying learning as described by four existing reward-based motor learning models. We simulated learning for various learner and task characteristics. If we simply determined the additional change post-failure, estimates of exploration were sensitive to learner and task characteristics. We identified two pitfalls in quantifying exploration based on trial-to-trial changes. Firstly, performance-dependent feedback can cause correlated samples of motor noise and exploration on successful trials, which biases exploration estimates. Secondly, the trial relative to which trial-to-trial change is calculated may also contain exploration, which causes underestimation. As a solution, we developed the additional trial-to-trial change (ATTC) method. By moving the reference trial one trial back and subtracting trial-to-trial changes following specific sequences of trial outcomes, exploration can be estimated reliably for the three models that explore based on the outcome of only the previous trial. Since ATTC estimates are based on a selection of trial sequences, this method requires many trials. In conclusion, if exploration is a binary function of previous trial outcome, the ATTC method allows for a model-free quantification of exploration.
当基于二进制成功信息学习一个动作时,失败后比成功后更容易发生变化。从理论上讲,失败后额外的可变性可能反映了获得成功的可能性探索。当平均行为发生变化时(例如在学习中),可以从后续运动之间的差异来估计可变性。在研究基于奖励的运动学习时,是否可以从这种逐次试验的变化中可靠地估计探索?为了回答这个问题,我们试图根据四个现有的基于奖励的运动学习模型来重建学习中的探索。我们模拟了各种学习者和任务特征的学习。如果我们只是确定失败后的额外变化,那么探索的估计值就会受到学习者和任务特征的影响。我们确定了在基于逐次试验的变化量化探索中的两个陷阱。首先,性能相关的反馈会导致成功试验中运动噪声和探索的相关样本,从而产生探索的偏差估计。其次,相对于计算逐次试验变化的试验本身可能也包含探索,从而导致低估。作为一种解决方案,我们开发了额外的逐次试验变化(ATTC)方法。通过将参考试验向后移动一个试验,并减去以下特定试验结果序列的逐次试验变化,可以可靠地估计基于仅前一个试验结果的三个模型的探索。由于 ATTC 估计值基于试验序列的选择,因此该方法需要许多试验。总之,如果探索是前一个试验结果的二进制函数,那么 ATTC 方法允许对探索进行无模型的量化。