IEEE Trans Neural Syst Rehabil Eng. 2021;29:607-618. doi: 10.1109/TNSRE.2021.3063015. Epub 2021 Mar 9.
This paper proposes to use deep reinforcement learning for the simulation of physics-based musculoskeletal models of both healthy subjects and transfemoral prostheses' users during normal level-ground walking. The deep reinforcement learning algorithm is based on the proximal policy optimization approach in combination with imitation learning to guarantee a natural walking gait while reducing the computational time of the training. Firstly, the optimization algorithm is implemented for the OpenSim model of a healthy subject and validated with experimental data from a public data-set. Afterwards, the optimization algorithm is implemented for the OpenSim model of a generic transfemoral prosthesis' user, which has been obtained by reducing the number of muscles around the knee and ankle joints and, specifically, by keeping only the uniarticular ones. The model of the transfemoral prosthesis' user shows a stable gait, with a forward dynamic comparable to the healthy subject's, yet using higher muscles' forces. Even though the computed muscles' forces could not be directly used as control inputs for muscle-like linear actuators due to their pattern, this study paves the way for using deep reinforcement learning for the design of the control architecture of transfemoral prostheses.
本文提出使用深度强化学习来模拟基于物理的健康受试者和股骨截肢者在正常平地行走时的肌肉骨骼模型。深度强化学习算法基于近端策略优化方法,并结合模仿学习,以保证自然的行走步态,同时减少训练的计算时间。首先,该优化算法在健康受试者的 OpenSim 模型上实现,并通过来自公共数据集的实验数据进行验证。然后,该优化算法在通用股骨截肢者的 OpenSim 模型上实现,该模型通过减少膝关节和踝关节周围的肌肉数量并仅保留单关节肌肉来获得。股骨截肢者用户的模型显示出稳定的步态,其前向动力学与健康受试者相似,但使用了更高的肌肉力量。尽管由于肌肉力量的模式,计算得到的肌肉力量不能直接用作类似肌肉的线性执行器的控制输入,但本研究为使用深度强化学习设计股骨截肢者的控制架构铺平了道路。