Toussaint Marc
School of Informatics, University of Edinburgh, Edinburgh, Scotland, UK.
Neural Comput. 2006 May;18(5):1132-55. doi: 10.1162/089976606776240995.
Experimental studies of reasoning and planned behavior have provided evidence that nervous systems use internal models to perform predictive motor control, imagery, inference, and planning. Classical (model-free) reinforcement learning approaches omit such a model; standard sensorimotor models account for forward and backward functions of sensorimotor dependencies but do not provide a proper neural representation on which to realize planning. We propose a sensorimotor map to represent such an internal model. The map learns a state representation similar to self-organizing maps but is inherently coupled to sensor and motor signals. Motor activations modulate the lateral connection strengths and thereby induce anticipatory shifts of the activity peak on the sensorimotor map. This mechanism encodes a model of the change of stimuli depending on the current motor activities. The activation dynamics on the map are derived from neural field models. An additional dynamic process on the sensorimotor map (derived from dynamic programming) realizes planning and emits corresponding goal-directed motor sequences, for instance, to navigate through a maze.
推理与计划行为的实验研究已提供证据表明,神经系统利用内部模型来执行预测性运动控制、意象、推理和规划。经典的(无模型)强化学习方法忽略了这样一个模型;标准的感觉运动模型考虑了感觉运动依赖关系的正向和反向功能,但没有提供一个合适的神经表征来实现规划。我们提出一种感觉运动映射来表示这样一个内部模型。该映射学习一种类似于自组织映射的状态表征,但本质上与感觉和运动信号相耦合。运动激活调节侧向连接强度,从而在感觉运动映射上诱导活动峰值的预期偏移。这种机制编码了根据当前运动活动而变化的刺激模型。映射上的激活动力学源自神经场模型。感觉运动映射上的另一个动态过程(源自动态规划)实现规划并发出相应的目标导向运动序列,例如,用于在迷宫中导航。