Department of Behavioral Sciences, University of Rio Grande, Rio Grande, Ohio, USA.
PLoS One. 2013;8(2):e55352. doi: 10.1371/journal.pone.0055352. Epub 2013 Feb 8.
We often need to learn how to move based on a single performance measure that reflects the overall success of our movements. However, movements have many properties, such as their trajectories, speeds and timing of end-points, thus the brain needs to decide which properties of movements should be improved; it needs to solve the credit assignment problem. Currently, little is known about how humans solve credit assignment problems in the context of reinforcement learning. Here we tested how human participants solve such problems during a trajectory-learning task. Without an explicitly-defined target movement, participants made hand reaches and received monetary rewards as feedback on a trial-by-trial basis. The curvature and direction of the attempted reach trajectories determined the monetary rewards received in a manner that can be manipulated experimentally. Based on the history of action-reward pairs, participants quickly solved the credit assignment problem and learned the implicit payoff function. A Bayesian credit-assignment model with built-in forgetting accurately predicts their trial-by-trial learning.
我们经常需要根据单一的绩效指标来学习运动,而该指标反映了运动的整体成功。然而,运动具有许多特性,例如其轨迹、速度和端点的时间,因此大脑需要决定应该改进运动的哪些特性;它需要解决信用分配问题。目前,关于人类如何在强化学习的背景下解决信用分配问题,人们知之甚少。在这里,我们测试了人类参与者在轨迹学习任务中如何解决这些问题。在没有明确定义的目标运动的情况下,参与者进行手的伸展,并在每次试验中收到金钱奖励作为反馈。尝试的伸展轨迹的曲率和方向以可以通过实验操纵的方式确定收到的金钱奖励。根据动作-奖励对的历史,参与者迅速解决了信用分配问题并学习了隐性收益函数。具有内置遗忘功能的贝叶斯信用分配模型准确地预测了他们的逐次学习。