IEEE Trans Neural Syst Rehabil Eng. 2020 Dec;28(12):3089-3099. doi: 10.1109/TNSRE.2020.3039970. Epub 2021 Jan 28.
Autonomous brain machine interfaces (BMIs) aim to enable paralyzed people to self-evaluate their movement intention to control external devices. Previous reinforcement learning (RL)-based decoders interpret the mapping between neural activity and movements using the external reward for well-trained subjects, and have not investigated the task learning procedure. The brain has developed a learning mechanism to identify the correct actions that lead to rewards in the new task. This internal guidance can be utilized to replace the external reference to advance BMIs as an autonomous system. In this study, we propose to build an internally rewarded reinforcement learning-based BMI framework using the multi-site recording to demonstrate the autonomous learning ability of the BMI decoder on the new task. We test the model on the neural data collected over multiple days while the rats were learning a new lever discrimination task. The primary motor cortex (M1) and medial prefrontal cortex (mPFC) spikes are interpreted by the proposed RL framework into the discrete lever press actions. The neural activity of the mPFC post the action duration is interpreted as the internal reward information, where a support vector machine is implemented to classify the reward vs. non-reward trials with a high accuracy of 87.5% across subjects. This internal reward is used to replace the external water reward to update the decoder, which is able to adapt to the nonstationary neural activity during subject learning. The multi-cortical recording allows us to take in more cortical recordings as input and uses internal critics to guide the decoder learning. Comparing with the classic decoder using M1 activity as the only input and external guidance, the proposed system with multi-cortical recordings shows a better decoding accuracy. More importantly, our internally rewarded decoder demonstrates the autonomous learning ability on the new task as the decoder successfully addresses the time-variant neural patterns while subjects are learning, and works asymptotically as the subjects' behavioral learning progresses. It reveals the potential of endowing BMIs with autonomous task learning ability in the RL framework.
自主脑机接口 (BMI) 旨在使瘫痪患者能够自我评估运动意图以控制外部设备。以前基于强化学习 (RL) 的解码器使用外部奖励来解释神经活动与运动之间的映射,这些解码器针对经过良好训练的对象进行了优化,但并未研究任务学习过程。大脑已经开发出一种学习机制,用于识别导致新任务中奖励的正确动作。这种内部指导可以被用来代替外部参考,从而推进作为自主系统的 BMI。在这项研究中,我们提出了一种基于内部奖励的强化学习 BMI 框架,该框架使用多部位记录来展示 BMI 解码器在新任务上的自主学习能力。我们在大鼠学习新的杠杆辨别任务的多天期间收集的神经数据上测试了该模型。主要运动皮层 (M1) 和内侧前额叶皮层 (mPFC) 尖峰通过所提出的 RL 框架被解释为离散的杠杆按压动作。在动作持续时间后,mPFC 的神经活动被解释为内部奖励信息,其中支持向量机实现了以 87.5%的高准确率对奖励与非奖励试验进行分类。这种内部奖励用于代替外部水奖励来更新解码器,解码器能够适应主体学习过程中不稳定的神经活动。多皮质记录允许我们接收更多皮质记录作为输入,并使用内部批评者来指导解码器学习。与仅使用 M1 活动作为唯一输入和外部指导的经典解码器相比,具有多皮质记录的所提出的系统显示出更好的解码准确性。更重要的是,我们的内部奖励解码器在新任务上表现出自主学习能力,因为解码器在主体学习时成功解决了时变的神经模式,并且随着主体行为学习的进展,它逐渐接近最佳状态。这揭示了在 RL 框架中为 BMI 赋予自主任务学习能力的潜力。