• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在运动功能的层次模型中利用延迟奖励信号学习运动序列。

Learning movement sequences with a delayed reward signal in a hierarchical model of motor function.

作者信息

Stringer S M, Rolls E T, Taylor P

机构信息

Oxford University, Centre for Computational Neuroscience, Department of Experimental Psychology, South Parks Road, Oxford OX1 3UD, United Kingdom.

出版信息

Neural Netw. 2007 Mar;20(2):172-81. doi: 10.1016/j.neunet.2006.01.016. Epub 2006 May 15.

DOI:10.1016/j.neunet.2006.01.016
PMID:16698235
Abstract

A key problem in reinforcement learning is how an animal is able to learn a sequence of movements when the reward signal only occurs at the end of the sequence. We describe how a hierarchical dynamical model of motor function is able to solve the problem of delayed reward in learning movement sequences using associative (Hebbian) learning. At the lowest level, the motor system encodes simple movements or primitives, while at higher levels the system encodes sequences of primitives. During training, the network is able to learn a high level motor program composed of a specific temporal sequence of motor primitives. The network is able to achieve this despite the fact that the reward signal, which indicates whether or not the desired motor program has been performed correctly, is received only at the end of each trial during learning. Use of a continuous attractor network in the architecture enables the network to generate the motor outputs required to produce the continuous movements necessary to implement the motor sequence.

摘要

强化学习中的一个关键问题是,当奖励信号仅在序列末尾出现时,动物如何能够学习一系列动作。我们描述了一个运动功能的分层动态模型如何能够使用联想(赫布式)学习来解决学习动作序列时的延迟奖励问题。在最低层次,运动系统对简单动作或基元进行编码,而在较高层次,系统对基元序列进行编码。在训练过程中,网络能够学习由特定时间序列的运动基元组成的高级运动程序。尽管奖励信号(表明所需的运动程序是否已正确执行)仅在学习期间每次试验结束时才收到,但网络仍能够做到这一点。在架构中使用连续吸引子网络使网络能够生成实现运动序列所需的连续运动所需的运动输出。

相似文献

1
Learning movement sequences with a delayed reward signal in a hierarchical model of motor function.在运动功能的层次模型中利用延迟奖励信号学习运动序列。
Neural Netw. 2007 Mar;20(2):172-81. doi: 10.1016/j.neunet.2006.01.016. Epub 2006 May 15.
2
Self-organizing continuous attractor networks and motor function.自组织连续吸引子网络与运动功能。
Neural Netw. 2003 Mar;16(2):161-82. doi: 10.1016/S0893-6080(02)00237-X.
3
SOVEREIGN: An autonomous neural system for incrementally learning planned action sequences to navigate towards a rewarded goal.主权者:一种自主神经系统,用于逐步学习规划动作序列以朝着奖励目标导航。
Neural Netw. 2008 Jun;21(5):699-758. doi: 10.1016/j.neunet.2007.09.016. Epub 2007 Oct 7.
4
Dimensional reduction for reward-based learning.基于奖励学习的降维
Network. 2006 Sep;17(3):235-52. doi: 10.1080/09548980600773215.
5
Learning of sequential movements by neural network model with dopamine-like reinforcement signal.通过具有多巴胺样强化信号的神经网络模型学习连续运动。
Exp Brain Res. 1998 Aug;121(3):350-4. doi: 10.1007/s002210050467.
6
Reward-dependent learning in neuronal networks for planning and decision making.用于规划和决策的神经网络中基于奖励的学习。
Prog Brain Res. 2000;126:217-29. doi: 10.1016/S0079-6123(00)26016-0.
7
A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.一种用于奖励调制的依赖于尖峰时间的可塑性的学习理论及其在生物反馈中的应用。
PLoS Comput Biol. 2008 Oct;4(10):e1000180. doi: 10.1371/journal.pcbi.1000180. Epub 2008 Oct 10.
8
Solving the distal reward problem with rare correlations.利用罕见相关性解决远端奖励问题。
Neural Comput. 2013 Apr;25(4):940-78. doi: 10.1162/NECO_a_00419. Epub 2013 Jan 22.
9
A spiking neural network model of an actor-critic learning agent.一种基于演员-评论家学习智能体的脉冲神经网络模型。
Neural Comput. 2009 Feb;21(2):301-39. doi: 10.1162/neco.2008.08-07-593.
10
Implications of different classes of sensorimotor disturbance for cerebellar-based motor learning models.不同类型的感觉运动障碍对基于小脑的运动学习模型的影响。
Biol Cybern. 2009 Jan;100(1):81-95. doi: 10.1007/s00422-008-0266-5. Epub 2008 Oct 22.

引用本文的文献

1
Belief inference for hierarchical hidden states in spatial navigation.空间导航中分层隐藏状态的信念推断。
Commun Biol. 2024 May 21;7(1):614. doi: 10.1038/s42003-024-06316-0.
2
Sensorimotor learning biases choice behavior: a learning neural field model for decision making.感觉运动学习偏向选择行为:决策的学习神经场模型。
PLoS Comput Biol. 2012;8(11):e1002774. doi: 10.1371/journal.pcbi.1002774. Epub 2012 Nov 15.
3
Distinct effects of perceptual quality on auditory word recognition, memory formation and recall in a neural model of sequential memory.
在序列记忆的神经模型中,感知质量对听觉单词识别、记忆形成和回忆有明显影响。
Front Syst Neurosci. 2010 Jun 3;4:14. doi: 10.3389/fnsys.2010.00014. eCollection 2010.