Jiang Yu, Jiang Zhong-Ping
Control and Networks Laboratory, Department of Electrical and Computer Engineering, Polytechnic School of Engineering, New York University, 5 Metrotech Center, Brooklyn, NY , 11201, USA.
Biol Cybern. 2014 Aug;108(4):459-73. doi: 10.1007/s00422-014-0613-7. Epub 2014 Jun 25.
Many characteristics of sensorimotor control can be explained by models based on optimization and optimal control theories. However, most of the previous models assume that the central nervous system has access to the precise knowledge of the sensorimotor system and its interacting environment. This viewpoint is difficult to be justified theoretically and has not been convincingly validated by experiments. To address this problem, this paper presents a new computational mechanism for sensorimotor control from a perspective of adaptive dynamic programming (ADP), which shares some features of reinforcement learning. The ADP-based model for sensorimotor control suggests that a command signal for the human movement is derived directly from the real-time sensory data, without the need to identify the system dynamics. An iterative learning scheme based on the proposed ADP theory is developed, along with rigorous convergence analysis. Interestingly, the computational model as advocated here is able to reproduce the motor learning behavior observed in experiments where a divergent force field or velocity-dependent force field was present. In addition, this modeling strategy provides a clear way to perform stability analysis of the overall system. Hence, we conjecture that human sensorimotor systems use an ADP-type mechanism to control movements and to achieve successful adaptation to uncertainties present in the environment.
许多感觉运动控制的特征可以通过基于优化和最优控制理论的模型来解释。然而,大多数先前的模型都假定中枢神经系统能够获取感觉运动系统及其相互作用环境的精确知识。这种观点在理论上难以得到证实,也尚未被实验令人信服地验证。为了解决这个问题,本文从自适应动态规划(ADP)的角度提出了一种新的感觉运动控制计算机制,它具有强化学习的一些特征。基于ADP的感觉运动控制模型表明,人类运动的指令信号直接从实时感官数据中得出,无需识别系统动力学。基于所提出的ADP理论开发了一种迭代学习方案,并进行了严格的收敛性分析。有趣的是,这里所倡导的计算模型能够重现存在发散力场或速度依赖力场的实验中观察到的运动学习行为。此外,这种建模策略为进行整个系统的稳定性分析提供了一种清晰的方法。因此,我们推测人类感觉运动系统使用ADP类型的机制来控制运动,并成功适应环境中存在的不确定性。