Suppr超能文献

基于演员-评论家强化学习方法对矢状臂点对点运动和振荡运动的神经肌肉控制

Neuromuscular control of the point to point and oscillatory movements of a sagittal arm with the actor-critic reinforcement learning method.

作者信息

Golkhou Vahid, Parnianpour Mohamad, Lucas Caro

机构信息

Biomechanics Laboratory, Department of Mechanical Engineering, Sharif University of Technology, Azadi Avenue, P.O. Box 11365-9567, Tehran, Iran.

出版信息

Comput Methods Biomech Biomed Engin. 2005 Apr;8(2):103-13. doi: 10.1080/10255840500167952.

Abstract

In this study, we have used a single link system with a pair of muscles that are excited with alpha and gamma signals to achieve both point to point and oscillatory movements with variable amplitude and frequency.The system is highly nonlinear in all its physical and physiological attributes. The major physiological characteristics of this system are simultaneous activation of a pair of nonlinear muscle-like-actuators for control purposes, existence of nonlinear spindle-like sensors and Golgi tendon organ-like sensor, actions of gravity and external loading. Transmission delays are included in the afferent and efferent neural paths to account for a more accurate representation of the reflex loops.A reinforcement learning method with an actor-critic (AC) architecture instead of middle and low level of central nervous system (CNS), is used to track a desired trajectory. The actor in this structure is a two layer feedforward neural network and the critic is a model of the cerebellum. The critic is trained by state-action-reward-state-action (SARSA) method. The critic will train the actor by supervisory learning based on the prior experiences. Simulation studies of oscillatory movements based on the proposed algorithm demonstrate excellent tracking capability and after 280 epochs the RMS error for position and velocity profiles were 0.02, 0.04 rad and rad/s, respectively.

摘要

在本研究中,我们使用了一个单链路系统,该系统带有一对由α和γ信号激发的肌肉,以实现具有可变幅度和频率的点对点运动以及振荡运动。该系统在其所有物理和生理属性方面都具有高度非线性。该系统的主要生理特征包括为控制目的同时激活一对非线性肌肉样致动器、存在非线性纺锤体样传感器和高尔基腱器官样传感器、重力和外部负载的作用。传入和传出神经路径中包含传输延迟,以更准确地表示反射回路。使用一种具有演员-评论家(AC)架构的强化学习方法来代替中枢神经系统(CNS)的中低级部分,以跟踪期望轨迹。此结构中的演员是一个两层前馈神经网络,评论家是小脑模型。评论家通过状态-动作-奖励-状态-动作(SARSA)方法进行训练。评论家将基于先前经验通过监督学习来训练演员。基于所提出算法的振荡运动仿真研究展示了出色的跟踪能力,在280个训练周期后,位置和速度曲线的均方根误差分别为0.02、0.04弧度和弧度/秒。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验