Bae Jihye, Sanchez Giraldo Luis G, Pohlmeyer Eric A, Francis Joseph T, Sanchez Justin C, Príncipe José C
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA.
Department of Biomedical Engineering, University of Miami, Coral Gables, FL 33146, USA.
Comput Intell Neurosci. 2015;2015:481375. doi: 10.1155/2015/481375. Epub 2015 Mar 17.
We study the feasibility and capability of the kernel temporal difference (KTD)(λ) algorithm for neural decoding. KTD(λ) is an online, kernel-based learning algorithm, which has been introduced to estimate value functions in reinforcement learning. This algorithm combines kernel-based representations with the temporal difference approach to learning. One of our key observations is that by using strictly positive definite kernels, algorithm's convergence can be guaranteed for policy evaluation. The algorithm's nonlinear functional approximation capabilities are shown in both simulations of policy evaluation and neural decoding problems (policy improvement). KTD can handle high-dimensional neural states containing spatial-temporal information at a reasonable computational complexity allowing real-time applications. When the algorithm seeks a proper mapping between a monkey's neural states and desired positions of a computer cursor or a robot arm, in both open-loop and closed-loop experiments, it can effectively learn the neural state to action mapping. Finally, a visualization of the coadaptation process between the decoder and the subject shows the algorithm's capabilities in reinforcement learning brain machine interfaces.
我们研究了内核时间差分(KTD)(λ)算法用于神经解码的可行性和能力。KTD(λ)是一种基于内核的在线学习算法,已被引入到强化学习中用于估计价值函数。该算法将基于内核的表示与时间差分学习方法相结合。我们的一个关键发现是,通过使用严格正定内核,可以保证算法在策略评估中的收敛性。该算法的非线性函数逼近能力在策略评估和神经解码问题(策略改进)的模拟中均有体现。KTD能够以合理的计算复杂度处理包含时空信息的高维神经状态,从而实现实时应用。当该算法在开环和闭环实验中寻求猴子神经状态与计算机光标或机器人手臂期望位置之间的适当映射时,它能够有效地学习神经状态到动作的映射。最后,解码器与受试者之间共适应过程的可视化展示了该算法在强化学习脑机接口中的能力。