Suppr超能文献

基于强化学习的解码:利用内部奖励实现脑机接口中的时延任务

Reinforcement Learning based Decoding Using Internal Reward for Time Delayed Task in Brain Machine Interfaces.

作者信息

Shen Xiang, Zhang Xiang, Huang Yifan, Chen Shuhang, Wang Yiwen

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:3351-3354. doi: 10.1109/EMBC44109.2020.9175964.

Abstract

Reinforcement learning (RL) algorithm interprets neural signals into movement intentions with the guidance of the reward in Brain-machine interfaces (BMIs). Current RL algorithms generally work for the tasks with immediate rewards delivery, and lack of efficiency in delayed reward task. Prefrontal cortex, including medial prefrontal cortex(mPFC), has been demonstrated to assign credit to intermediate steps, which reinforces preceding action more efficiently. In this paper, we propose to simulate the functionality of mPFC activities as intermediate rewards to train a RL based decoder in a two-step movement task. A support vector machine (SVM) is adopted to verify if the subject expects a reward due to the accomplishment of a subtask from mPFC activity. Then this discrimination result will be utilized to guide the training of the RL decoder for each step respectively. Here, we apply the Sarsa-style attention-gated reinforcement learning (SAGREL) as the decoder to interpret motor cortex(M1) activity to action states. We test on in vivo primary motor cortex (M1) and mPFC data collected from rats, where the rats need to first trigger the start and then press lever for rewards using M1 signals. SAGREL using intermediate rewards from mPFC activities achieves a prediction accuracy of 66.8% ± 2.0.% (mean ± std) %, which is significantly better than the one using the reward by the end of trial (45.9.% ± 1.2%). This reveals the potentials of modelling mPFC activities as intermediate rewards for the delayed reward tasks.

摘要

在脑机接口(BMI)中,强化学习(RL)算法在奖励的引导下将神经信号转化为运动意图。当前的RL算法通常适用于即时奖励的任务,而在延迟奖励任务中效率较低。前额叶皮层,包括内侧前额叶皮层(mPFC),已被证明能够为中间步骤赋予信用,从而更有效地强化先前的动作。在本文中,我们提议模拟mPFC活动的功能作为中间奖励,以在两步运动任务中训练基于RL的解码器。采用支持向量机(SVM)来验证受试者是否因完成来自mPFC活动的子任务而期望获得奖励。然后,该判别结果将分别用于指导RL解码器每一步的训练。在这里,我们应用Sarsa风格的注意力门控强化学习(SAGREL)作为解码器,将运动皮层(M1)活动解释为动作状态。我们对从大鼠收集的体内初级运动皮层(M1)和mPFC数据进行测试,其中大鼠需要首先触发开始,然后使用M1信号按压杠杆以获得奖励。使用来自mPFC活动的中间奖励的SAGREL实现了66.8%±2.0.%(平均值±标准差)的预测准确率,这显著优于在试验结束时使用奖励的情况(45.9.%±1.2%)。这揭示了将mPFC活动建模为延迟奖励任务的中间奖励的潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验