Suppr超能文献

在移动机器人导航任务中,具有动态模型学习的线性可解马尔可夫决策过程的评估。

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.

机构信息

Neural Computation Laboratory, Graduate School of Information Science, Nara Institute of Science and Technology Ikoma, Nara, Japan ; Neural Computation Unit, Okinawa Institute of Science and Technology Onna-son, Okinawa, Japan.

出版信息

Front Neurorobot. 2013 Apr 5;7:7. doi: 10.3389/fnbot.2013.00007. eCollection 2013.

Abstract

Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in which the Bellman's equation can be converted into a linear equation by an exponential transformation of the state value function (Todorov, 2009b). In an LMDP, the optimal value function and the corresponding control policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunction problem in a continuous state using the knowledge of the system dynamics and the action, state, and terminal cost functions. In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in which the dynamics of the body and the environment have to be learned from experience. We first perform a simulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynamics model on the derived the action policy. The result shows that a crude linear approximation of the non-linear dynamics can still allow solution of the task, despite with a higher total cost. We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robot platform. The state is given by the position and the size of a battery in its camera view and two neck joint angles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servo controller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state cost functions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics model performed equivalently with the optimal linear quadratic regulator (LQR). In the non-quadratic task, the LMDP controller with a linear dynamics model showed the best performance. The results demonstrate the usefulness of the LMDP framework in real robot control even when simple linear models are used for dynamics learning.

摘要

线性可解马尔可夫决策过程(LMDP)是一类最优控制问题,其中贝尔曼方程可以通过状态值函数的指数变换转换为线性方程(Todorov,2009b)。在 LMDP 中,最优值函数和相应的控制策略是通过在离散状态空间中求解特征值问题或在连续状态空间中求解特征函数问题来获得的,使用系统动力学和动作、状态和终端成本函数的知识。在这项研究中,我们评估了 LMDP 框架在真实机器人控制中的有效性,其中必须从经验中学习身体和环境的动力学。我们首先进行了一个杆摆起任务的模拟研究,以评估所学习的动力学模型的准确性对导出的动作策略的影响。结果表明,即使总代价较高,非线性动力学的粗糙线性近似仍可允许解决任务。然后,我们使用我们的 Spring Dog 移动机器人平台进行了电池捕获任务的真实机器人实验。状态由电池在其相机视图中的位置和大小以及两个颈部关节角度给出。动作是两个轮子的速度,而颈部关节由视觉伺服控制器控制。我们在具有二次和高斯状态成本函数的任务中测试了线性和双线性动力学模型。在二次成本任务中,从学习的线性动力学模型导出的 LMDP 控制器与最优线性二次调节器(LQR)表现相当。在非二次任务中,具有线性动力学模型的 LMDP 控制器表现出最佳性能。结果证明了即使使用简单的线性模型进行动力学学习,LMDP 框架在真实机器人控制中的有用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/a01a770c5511/fnbot-07-00007-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验