Marsh Brandi T, Tarigoppula Venkata S Aditya, Chen Chen, Francis Joseph T
Joint Program in Biomedical Engineering, New York University-Polytechnic School of Engineering and State University of New York, Downstate Medical Center.
Department of Physiology and Pharmacology.
J Neurosci. 2015 May 13;35(19):7374-87. doi: 10.1523/JNEUROSCI.1802-14.2015.
For decades, neurophysiologists have worked on elucidating the function of the cortical sensorimotor control system from the standpoint of kinematics or dynamics. Recently, computational neuroscientists have developed models that can emulate changes seen in the primary motor cortex during learning. However, these simulations rely on the existence of a reward-like signal in the primary sensorimotor cortex. Reward modulation of the primary sensorimotor cortex has yet to be characterized at the level of neural units. Here we demonstrate that single units/multiunits and local field potentials in the primary motor (M1) cortex of nonhuman primates (Macaca radiata) are modulated by reward expectation during reaching movements and that this modulation is present even while subjects passively view cursor motions that are predictive of either reward or nonreward. After establishing this reward modulation, we set out to determine whether we could correctly classify rewarding versus nonrewarding trials, on a moment-to-moment basis. This reward information could then be used in collaboration with reinforcement learning principles toward an autonomous brain-machine interface. The autonomous brain-machine interface would use M1 for both decoding movement intention and extraction of reward expectation information as evaluative feedback, which would then update the decoding algorithm as necessary. In the work presented here, we show that this, in theory, is possible.
几十年来,神经生理学家一直从运动学或动力学的角度致力于阐明皮质感觉运动控制系统的功能。最近,计算神经科学家开发出了能够模拟学习过程中初级运动皮层所出现变化的模型。然而,这些模拟依赖于初级感觉运动皮层中存在类似奖励的信号。初级感觉运动皮层的奖励调制在神经单元层面尚未得到表征。在此,我们证明,在非人类灵长类动物(恒河猴)的初级运动(M1)皮层中,单个神经元/多个神经元以及局部场电位在伸手动作期间会受到奖励预期的调制,并且即使在受试者被动观看预示奖励或无奖励的光标运动时,这种调制依然存在。在确定了这种奖励调制之后,我们着手判断能否在每时每刻正确区分奖励性试验和非奖励性试验。然后,这种奖励信息可与强化学习原理相结合,用于构建自主脑机接口。自主脑机接口将利用M1来解码运动意图并提取奖励预期信息作为评估反馈,进而在必要时更新解码算法。在本文所展示的研究中,我们表明,从理论上讲这是可行的。