IEEE Trans Neural Syst Rehabil Eng. 2017 Oct;25(10):1892-1905. doi: 10.1109/TNSRE.2017.2700395. Epub 2017 May 2.
Functional Electrical Stimulation (FES) employs neuroprostheses to apply electrical current to the nerves and muscles of individuals paralyzed by spinal cord injury to restore voluntary movement. Neuroprosthesis controllers calculate stimulation patterns to produce desired actions. To date, no existing controller is able to efficiently adapt its control strategy to the wide range of possible physiological arm characteristics, reaching movements, and user preferences that vary over time. Reinforcement learning (RL) is a control strategy that can incorporate human reward signals as inputs to allow human users to shape controller behavior. In this paper, ten neurologically intact human participants assigned subjective numerical rewards to train RL controllers, evaluating animations of goal-oriented reaching tasks performed using a planar musculoskeletal human arm simulation. The RL controller learning achieved using human trainers was compared with learning accomplished using human-like rewards generated by an algorithm; metrics included success at reaching the specified target; time required to reach the target; and target overshoot. Both sets of controllers learned efficiently and with minimal differences, significantly outperforming standard controllers. Reward positivity and consistency were found to be unrelated to learning success. These results suggest that human rewards can be used effectively to train RL-based FES controllers.
功能性电刺激 (FES) 使用神经假体将电流应用于因脊髓损伤而瘫痪的个体的神经和肌肉,以恢复自主运动。神经假体控制器计算刺激模式以产生所需的动作。迄今为止,没有现有的控制器能够有效地将其控制策略适应广泛的可能的生理手臂特征、到达运动和随时间变化的用户偏好。强化学习 (RL) 是一种控制策略,它可以将人类奖励信号作为输入,允许人类用户塑造控制器行为。在本文中,十位神经完整的人类参与者为 RL 控制器分配主观数值奖励,评估使用平面肌肉骨骼人体手臂模拟执行的目标导向到达任务的动画。使用人类培训师进行的 RL 控制器学习与使用算法生成的类似人类的奖励进行的学习进行了比较;指标包括达到指定目标的成功率;达到目标所需的时间;以及目标超调。两组控制器都高效地学习,并且差异很小,明显优于标准控制器。发现奖励积极性和一致性与学习成功无关。这些结果表明,人类奖励可以有效地用于训练基于 RL 的 FES 控制器。