Tan Jieyuan, Zhang Xiang, Wu Shenghui, Song Zhiwei, Wang Yiwen
IEEE Trans Neural Syst Rehabil Eng. 2024 Nov 21;PP. doi: 10.1109/TNSRE.2024.3503713.
Reinforcement learning (RL)-based brain machine interfaces (BMIs) assist paralyzed people in controlling neural prostheses without the need for real limb movement as supervised signals. The design of reward signal significantly impacts the learning efficiency of the RL-based decoders. Existing reward designs in the RL-based BMI framework rely on external rewards or manually labeled internal rewards, unable to accurately extract subjects' internal evaluation. In this paper, we propose a hidden brain state-based kernel inverse reinforcement learning (HBS-KIRL) method to accurately infer the subject-specific internal evaluation from neural activity during the BMI task. The state-space model is applied to project the neural state into low-dimensional hidden brain state space, which greatly reduces the exploration dimension. Then the kernel method is applied to speed up the convergence of policy, reward, and Q-value networks in reproducing kernel Hilbert space (RKHS). We tested our proposed algorithm on the data collected from the medial prefrontal cortex (mPFC) of rats when they were performing a two-lever-discrimination task. We assessed the state-value estimation performance of our proposed method and compared it with naïve IRL and PCA-based IRL. To validate that the extracted internal evaluation could contribute to the decoder training, we compared the decoding performance of decoders trained by different reward models, including manually designed reward, naïve IRL, PCA-IRL, and our proposed HBS-KIRL. The results show that the HBS-KIRL method can give a stable and accurate estimation of state-value distribution with respect to behavior. Compared with other methods, the decoder guided by HBS-KIRL achieves consistent and better decoding performance over days. This study reveals the potential of applying the IRL method to better extract subject-specific evaluation and improve the BMI decoding performance.
基于强化学习(RL)的脑机接口(BMI)可帮助瘫痪患者控制神经假体,而无需真实肢体运动作为监督信号。奖励信号的设计对基于RL的解码器的学习效率有显著影响。基于RL的BMI框架中现有的奖励设计依赖于外部奖励或人工标注的内部奖励,无法准确提取受试者的内部评价。在本文中,我们提出了一种基于隐藏脑状态的核逆强化学习(HBS-KIRL)方法,以从BMI任务期间的神经活动中准确推断特定于受试者的内部评价。应用状态空间模型将神经状态投影到低维隐藏脑状态空间,这大大降低了探索维度。然后应用核方法加速策略、奖励和Q值网络在再生核希尔伯特空间(RKHS)中的收敛。我们在大鼠执行双杠杆辨别任务时从其内侧前额叶皮层(mPFC)收集的数据上测试了我们提出的算法。我们评估了我们提出的方法的状态值估计性能,并将其与朴素逆强化学习(IRL)和基于主成分分析(PCA)的IRL进行了比较。为了验证提取的内部评价有助于解码器训练,我们比较了由不同奖励模型训练的解码器的解码性能,包括人工设计的奖励、朴素IRL、PCA-IRL和我们提出的HBS-KIRL。结果表明,HBS-KIRL方法能够给出关于行为的稳定且准确的状态值分布估计。与其他方法相比,由HBS-KIRL引导的解码器在数天内实现了一致且更好的解码性能。这项研究揭示了应用IRL方法更好地提取特定于受试者的评价并提高BMI解码性能的潜力。