Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:3327-3333. doi: 10.1109/EMBC48229.2022.9871862.
Kernel temporal differences (KTD) (λ) algorithm integrated in Q-learning (Q-KTD) has shown its applicability and feasibility for reinforcement learning brain machine interfaces (RLBMIs). RLBMI with its unique learning strategy based on trial-error allows continuous learning and adaptation in BMIs. Q-KTD has shown good performance in both open and closed-loop experiments for finding a proper mapping from neural intention to control commands of an external device. However, previous studies have been limited to intracortical BMIs where monkey's firing rates from primary motor cortex were used as inputs to the neural decoder. This study provides the first attempt to investigate Q-KTD algorithm's applicability in EEG-based RLBMIs. Two different publicly available EEG data sets are considered, we refer to them as Data set A and Data set B. EEG motor imagery tasks are integrated in a single step center-out reaching task, and we observe the open-loop RLBMI experiments reach 100% average success rates after sufficient learning experience. Data set A converges after approximately 20 epochs for raw features and Data set B shows convergence after approximately 40 epochs for both raw and Fourier transform features. Although there still exist challenges to overcome in EEG-based RLBMI using Q-KTD, including increasing the learning speed, and optimization of a continuously growing number of kernel units, the results encourage further investigation of Q-KTD in closed-loop RLBMIs using EEG. Clinical Relevance- This study supports feasibility of noninvasive EEG-based RLBMI implementations and addresses benefits and challenges of RLBMI using EEG.
核时间差分 (KTD) (λ) 算法集成在 Q 学习 (Q-KTD) 中,已经显示出其在强化学习脑机接口 (RLBMI) 中的适用性和可行性。RLBMI 基于试错的独特学习策略允许在 BMI 中进行连续学习和自适应。Q-KTD 在开环和闭环实验中都表现出了很好的性能,能够从神经意图到外部设备控制命令找到合适的映射。然而,以前的研究仅限于皮质内 BMI,其中猴子初级运动皮层的放电率被用作神经解码器的输入。本研究首次尝试研究 Q-KTD 算法在基于 EEG 的 RLBMI 中的适用性。考虑了两个不同的公开可用的 EEG 数据集,我们分别称之为数据集 A 和数据集 B。EEG 运动想象任务集成在一个单步中心向外的任务中,我们观察到开环 RLBMI 实验在经过足够的学习经验后达到 100%的平均成功率。对于原始特征,数据集 A 在大约 20 个 epoch 后收敛,而数据集 B 显示在大约 40 个 epoch 后对原始和傅里叶变换特征都收敛。尽管在使用 Q-KTD 的基于 EEG 的 RLBMI 中仍然存在一些需要克服的挑战,包括提高学习速度和优化不断增长的核单元数量,但结果鼓励进一步在使用 EEG 的闭环 RLBMI 中研究 Q-KTD。临床相关性-本研究支持基于非侵入性 EEG 的 RLBMI 实现的可行性,并解决了使用 EEG 的 RLBMI 的优势和挑战。