Suppr超能文献

基于逆强化学习从内侧前额叶皮质皮层活动估算奖励函数。

Estimating Reward Function from Medial Prefrontal Cortex Cortical Activity using Inverse Reinforcement Learning.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:3346-3349. doi: 10.1109/EMBC48229.2022.9871194.

Abstract

Reinforcement learning (RL)-based brain-machine interfaces (BMIs) learn the mapping from neural signals to subjects' intention using a reward signal. External rewards (water or food) or internal rewards extracted from neural activity are leveraged to update the parameters of decoders in the existing RL-based BMI framework. However, for complex tasks, the design of external reward could be difficult, which may not fully reflect the subject's own evaluation internally. It is important to obtain an internal reward model from neural activity to access subject's internal evaluation when the subject is performing the task through trial and error. In this paper, we propose to use an inverse reinforcement learning (IRL) method to estimate the internal reward function interpreted from the brain to assist the update of the decoders. Specifically, the inverse Q-learning (IQL) algorithm is applied to extract internal reward information from real data collected from medial prefrontal cortex (mPFC) when a rat was learning a two-lever-press discrimination task. Such an internal reward information is validated by checking whether it can guide the training of the RL decoder to complete movement task. Compared with the RL decoder trained with the external reward, our approach achieves a similar decoding performance. This preliminary result validates the effectiveness of using IRL to obtain the internal reward model. It reveals the potential of estimating internal reward model to improve the design of autonomous learning BMIs.

摘要

基于强化学习(RL)的脑机接口(BMI)使用奖励信号来学习从神经信号到主体意图的映射。外部奖励(水或食物)或从神经活动中提取的内部奖励被用于更新现有基于 RL 的 BMI 框架中的解码器的参数。然而,对于复杂的任务,外部奖励的设计可能很困难,这可能无法完全反映主体内部的自我评估。当主体通过试错执行任务时,从神经活动中获得内部奖励模型以获取主体的内部评估非常重要。在本文中,我们提出使用逆强化学习(IRL)方法来估计从大脑中解释的内部奖励函数,以辅助解码器的更新。具体来说,应用逆 Q 学习(IQL)算法从大鼠学习双杠杆按压辨别任务时从中脑前额叶皮层(mPFC)收集的真实数据中提取内部奖励信息。通过检查它是否可以指导 RL 解码器的训练来完成运动任务来验证这种内部奖励信息。与使用外部奖励训练的 RL 解码器相比,我们的方法实现了类似的解码性能。这初步验证了使用 IRL 获得内部奖励模型的有效性。它揭示了估计内部奖励模型以改善自主学习 BMI 设计的潜力。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验