运动强化学习中的信用分配。

Credit assignment during movement reinforcement learning.

机构信息

Department of Behavioral Sciences, University of Rio Grande, Rio Grande, Ohio, USA.

出版信息

PLoS One. 2013;8(2):e55352. doi: 10.1371/journal.pone.0055352. Epub 2013 Feb 8.

DOI:10.1371/journal.pone.0055352

PMID:23408972

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3568147/

Abstract

We often need to learn how to move based on a single performance measure that reflects the overall success of our movements. However, movements have many properties, such as their trajectories, speeds and timing of end-points, thus the brain needs to decide which properties of movements should be improved; it needs to solve the credit assignment problem. Currently, little is known about how humans solve credit assignment problems in the context of reinforcement learning. Here we tested how human participants solve such problems during a trajectory-learning task. Without an explicitly-defined target movement, participants made hand reaches and received monetary rewards as feedback on a trial-by-trial basis. The curvature and direction of the attempted reach trajectories determined the monetary rewards received in a manner that can be manipulated experimentally. Based on the history of action-reward pairs, participants quickly solved the credit assignment problem and learned the implicit payoff function. A Bayesian credit-assignment model with built-in forgetting accurately predicts their trial-by-trial learning.

摘要

我们经常需要根据单一的绩效指标来学习运动，而该指标反映了运动的整体成功。然而，运动具有许多特性，例如其轨迹、速度和端点的时间，因此大脑需要决定应该改进运动的哪些特性；它需要解决信用分配问题。目前，关于人类如何在强化学习的背景下解决信用分配问题，人们知之甚少。在这里，我们测试了人类参与者在轨迹学习任务中如何解决这些问题。在没有明确定义的目标运动的情况下，参与者进行手的伸展，并在每次试验中收到金钱奖励作为反馈。尝试的伸展轨迹的曲率和方向以可以通过实验操纵的方式确定收到的金钱奖励。根据动作-奖励对的历史，参与者迅速解决了信用分配问题并学习了隐性收益函数。具有内置遗忘功能的贝叶斯信用分配模型准确地预测了他们的逐次学习。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fed4/3568147/40bbb2c60ab5/pone.0055352.g001.jpg

相似文献

Credit assignment during movement reinforcement learning.运动强化学习中的信用分配。

PLoS One. 2013;8(2):e55352. doi: 10.1371/journal.pone.0055352. Epub 2013 Feb 8.

Credit Assignment in a Motor Decision Making Task Is Influenced by Agency and Not Sensory Prediction Errors.在一项运动决策任务中，信用分配受机构影响，而不受感官预测误差影响。

J Neurosci. 2018 May 9;38(19):4521-4530. doi: 10.1523/JNEUROSCI.3601-17.2018. Epub 2018 Apr 12.

Solving the credit assignment problem: explicit and implicit learning of action sequences with probabilistic outcomes.解决信用分配问题：具有概率性结果的动作序列的显性和隐性学习。

Psychol Res. 2008 May;72(3):321-30. doi: 10.1007/s00426-007-0113-7. Epub 2007 Apr 20.

Credit assignment in movement-dependent reinforcement learning.运动依赖型强化学习中的信用分配

Proc Natl Acad Sci U S A. 2016 Jun 14;113(24):6797-802. doi: 10.1073/pnas.1523669113. Epub 2016 May 31.

Exacerbation of the credit assignment problem in rats with lesions of the medial prefrontal cortex is revealed by Bayesian analysis of behavior in the pre-solution period of learning.内侧前额叶皮质损伤的大鼠在学习前解决期的行为贝叶斯分析中表现出信用分配问题的加剧。

Behav Brain Res. 2019 Oct 17;372:112037. doi: 10.1016/j.bbr.2019.112037. Epub 2019 Jun 13.

Prefrontal cortex state representations shape human credit assignment.前额皮质状态表示塑造人类的信用分配。

Elife. 2023 Jul 3;12:e84888. doi: 10.7554/eLife.84888.

Statistical mechanics of structural and temporal credit assignment effects on learning in neural networks.神经网络中结构和时间信用分配对学习影响的统计力学

Phys Rev E Stat Nonlin Soft Matter Phys. 2011 May;83(5 Pt 1):051125. doi: 10.1103/PhysRevE.83.051125. Epub 2011 May 20.

Human subjects exploit a cognitive map for credit assignment.人类主体利用认知图进行信用分配。

Proc Natl Acad Sci U S A. 2021 Jan 26;118(4). doi: 10.1073/pnas.2016884118.

Spatio-temporal credit assignment in neuronal population learning.神经元群体学习中的时空信用分配。

PLoS Comput Biol. 2011 Jun;7(6):e1002092. doi: 10.1371/journal.pcbi.1002092. Epub 2011 Jun 30.

Kernel Temporal Difference based Reinforcement Learning for Brain Machine Interfaces.基于核时差分的脑机接口强化学习。

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:6721-6724. doi: 10.1109/EMBC46164.2021.9631086.

引用本文的文献

Metaplasticity and continual learning: mechanisms subserving brain computer interface proficiency.可塑性变化与持续学习：支持脑机接口熟练度的机制

J Neural Eng. 2025 May 23;22(3):036020. doi: 10.1088/1741-2552/add37b.

The sign of exploration during reward-based motor learning is not independent from trial to trial.基于奖励的运动学习过程中的探索信号在每次试验之间并非相互独立。

Exp Brain Res. 2025 Apr 15;243(5):117. doi: 10.1007/s00221-025-07074-z.

Contractions in human cerebellar-cortical manifold structure underlie motor reinforcement learning.人类小脑-皮质流形结构中的收缩是运动强化学习的基础。

J Neurosci. 2025 Mar 18;45(18). doi: 10.1523/JNEUROSCI.2158-24.2025.

Distinct patterns of connectivity with the motor cortex reflect different components of sensorimotor learning.与运动皮层不同的连接模式反映了感觉运动学习的不同组成部分。

PLoS Biol. 2024 Dec 3;22(12):e3002934. doi: 10.1371/journal.pbio.3002934. eCollection 2024 Dec.

Applied Motor Noise Affects Specific Learning Mechanisms during Short-Term Adaptation to Novel Movement Dynamics.应用运动噪声在短期适应新运动动力学过程中影响特定学习机制。

eNeuro. 2025 Jan 16;12(1). doi: 10.1523/ENEURO.0100-24.2024. Print 2025 Jan.

Reinforcement learning in motor skill acquisition: using the reward positivity to understand the mechanisms underlying short- and long-term behavior adaptation.运动技能习得中的强化学习：利用奖励积极性来理解短期和长期行为适应背后的机制。

Front Behav Neurosci. 2024 Oct 30;18:1466970. doi: 10.3389/fnbeh.2024.1466970. eCollection 2024.

Reconfigurations of cortical manifold structure during reward-based motor learning.基于奖励的运动学习过程中皮层流形结构的重配置。

Elife. 2024 Jun 25;12:RP91928. doi: 10.7554/eLife.91928.

Reach adaption to a visuomotor gain with terminal error feedback involves reinforcement learning.通过终端误差反馈适应视觉运动增益涉及强化学习。

PLoS One. 2022 Jun 1;17(6):e0269297. doi: 10.1371/journal.pone.0269297. eCollection 2022.

Human Variation in Error-Based and Reinforcement Motor Learning Is Associated With Entorhinal Volume.基于错误的运动学习和强化运动学习中的人类变异性与内嗅皮层体积有关。

Cereb Cortex. 2022 Aug 3;32(16):3423-3440. doi: 10.1093/cercor/bhab424.

Learning a reach trajectory based on binary reward feedback.基于二进制奖励反馈学习到达轨迹。

Sci Rep. 2021 Jan 29;11(1):2667. doi: 10.1038/s41598-020-80155-x.

本文引用的文献

Neuromotor noise, error tolerance and velocity-dependent costs in skilled performance.熟练表现中的神经运动噪声、容错能力和速度相关代价。

PLoS Comput Biol. 2011 Sep;7(9):e1002159. doi: 10.1371/journal.pcbi.1002159. Epub 2011 Sep 22.

Relevance of error: what drives motor adaptation?误差的相关性：是什么驱动运动适应？

J Neurophysiol. 2009 Feb;101(2):655-64. doi: 10.1152/jn.90545.2008. Epub 2008 Nov 19.

Estimating the sources of motor errors for adaptation and generalization.估计适应和泛化中运动误差的来源。

Nat Neurosci. 2008 Dec;11(12):1454-61. doi: 10.1038/nn.2229. Epub 2008 Nov 16.

Toward a new theory of motor synergies.迈向运动协同的新理论。

Motor Control. 2007 Jul;11(3):276-308. doi: 10.1123/mcj.11.3.276.

Psychol Res. 2008 May;72(3):321-30. doi: 10.1007/s00426-007-0113-7. Epub 2007 Apr 20.

From recurrent choice to skill learning: a reinforcement-learning model.从反复选择到技能学习：一种强化学习模型

J Exp Psychol Gen. 2006 May;135(2):184-206. doi: 10.1037/0096-3445.135.2.184.

Behavioral theories and the neurophysiology of reward.行为理论与奖赏的神经生理学

Annu Rev Psychol. 2006;57:87-115. doi: 10.1146/annurev.psych.56.091103.070229.

Decomposition of variability in the execution of goal-oriented tasks: three components of skill improvement.目标导向任务执行中变异性的分解：技能提升的三个组成部分。

J Exp Psychol Hum Percept Perform. 2004 Feb;30(1):212-33. doi: 10.1037/0096-1523.30.1.212.

A randomization method for the calculation of covariation in multiple nonlinear relations: illustrated with the example of goal-directed movements.一种用于计算多个非线性关系中协变的随机化方法：以目标导向运动为例进行说明。

Biol Cybern. 2003 Jul;89(1):22-33. doi: 10.1007/s00422-003-0399-5.

Optimal feedback control as a theory of motor coordination.作为运动协调理论的最优反馈控制

Nat Neurosci. 2002 Nov;5(11):1226-35. doi: 10.1038/nn963.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

运动强化学习中的信用分配。

Credit assignment during movement reinforcement learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献