Bacelar Mariane F B, Lohse Keith R, Parma Juliana O, Miller Matthew W
Department of Kinesiology, Boise State University, Boise, ID, United States.
Program in Physical Therapy, Washington University School of Medicine, St. Louis, MO, United States.
Front Behav Neurosci. 2024 Oct 30;18:1466970. doi: 10.3389/fnbeh.2024.1466970. eCollection 2024.
According to reinforcement learning, humans adjust their behavior based on the difference between actual and anticipated outcomes (i.e., prediction error) with the main goal of maximizing rewards through their actions. Despite offering a strong theoretical framework to understand how we acquire motor skills, very few studies have investigated reinforcement learning predictions and its underlying mechanisms in motor skill acquisition.
In the present study, we explored a 134-person dataset consisting of learners' feedback-evoked brain activity (reward positivity; RewP) and motor accuracy during the practice phase and delayed retention test to investigate whether these variables interacted according to reinforcement learning predictions.
Results showed a non-linear relationship between RewP and trial accuracy, which was moderated by the learners' performance level. Specifically, high-performing learners were more sensitive to violations in reward expectations compared to low-performing learners, likely because they developed a stronger representation of the skill and were able to rely on more stable outcome predictions. Furthermore, contrary to our prediction, the average RewP during acquisition did not predict performance on the delayed retention test.
Together, these findings support the use of reinforcement learning models to understand short-term behavior adaptation and highlight the complexity of the motor skill consolidation process, which would benefit from a multi-mechanistic approach to further our understanding of this phenomenon.
根据强化学习理论,人类会根据实际结果与预期结果之间的差异(即预测误差)来调整自己的行为,其主要目标是通过行动最大化奖励。尽管强化学习为理解我们如何获得运动技能提供了一个强大的理论框架,但很少有研究探讨强化学习预测及其在运动技能习得中的潜在机制。
在本研究中,我们探索了一个包含134人的数据集,该数据集包括学习者在练习阶段和延迟保留测试期间的反馈诱发脑活动(奖励正性;RewP)和运动准确性,以研究这些变量是否根据强化学习预测相互作用。
结果显示RewP与试验准确性之间存在非线性关系,这种关系受学习者表现水平的调节。具体而言,与低表现学习者相比,高表现学习者对奖励期望的违反更为敏感,这可能是因为他们对技能形成了更强的表征,并且能够依赖更稳定的结果预测。此外,与我们的预测相反,习得过程中的平均RewP并不能预测延迟保留测试中的表现。
总之,这些发现支持使用强化学习模型来理解短期行为适应,并突出了运动技能巩固过程的复杂性,这将受益于多机制方法,以进一步加深我们对这一现象的理解。