在移动机器人导航任务中，具有动态模型学习的线性可解马尔可夫决策过程的评估。

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.

机构信息

Neural Computation Laboratory, Graduate School of Information Science, Nara Institute of Science and Technology Ikoma, Nara, Japan ; Neural Computation Unit, Okinawa Institute of Science and Technology Onna-son, Okinawa, Japan.

出版信息

Front Neurorobot. 2013 Apr 5;7:7. doi: 10.3389/fnbot.2013.00007. eCollection 2013.

DOI:10.3389/fnbot.2013.00007

PMID:23576983

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3617398/

Abstract

Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in which the Bellman's equation can be converted into a linear equation by an exponential transformation of the state value function (Todorov, 2009b). In an LMDP, the optimal value function and the corresponding control policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunction problem in a continuous state using the knowledge of the system dynamics and the action, state, and terminal cost functions. In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in which the dynamics of the body and the environment have to be learned from experience. We first perform a simulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynamics model on the derived the action policy. The result shows that a crude linear approximation of the non-linear dynamics can still allow solution of the task, despite with a higher total cost. We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robot platform. The state is given by the position and the size of a battery in its camera view and two neck joint angles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servo controller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state cost functions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics model performed equivalently with the optimal linear quadratic regulator (LQR). In the non-quadratic task, the LMDP controller with a linear dynamics model showed the best performance. The results demonstrate the usefulness of the LMDP framework in real robot control even when simple linear models are used for dynamics learning.

摘要

线性可解马尔可夫决策过程（LMDP）是一类最优控制问题，其中贝尔曼方程可以通过状态值函数的指数变换转换为线性方程（Todorov，2009b）。在 LMDP 中，最优值函数和相应的控制策略是通过在离散状态空间中求解特征值问题或在连续状态空间中求解特征函数问题来获得的，使用系统动力学和动作、状态和终端成本函数的知识。在这项研究中，我们评估了 LMDP 框架在真实机器人控制中的有效性，其中必须从经验中学习身体和环境的动力学。我们首先进行了一个杆摆起任务的模拟研究，以评估所学习的动力学模型的准确性对导出的动作策略的影响。结果表明，即使总代价较高，非线性动力学的粗糙线性近似仍可允许解决任务。然后，我们使用我们的 Spring Dog 移动机器人平台进行了电池捕获任务的真实机器人实验。状态由电池在其相机视图中的位置和大小以及两个颈部关节角度给出。动作是两个轮子的速度，而颈部关节由视觉伺服控制器控制。我们在具有二次和高斯状态成本函数的任务中测试了线性和双线性动力学模型。在二次成本任务中，从学习的线性动力学模型导出的 LMDP 控制器与最优线性二次调节器（LQR）表现相当。在非二次任务中，具有线性动力学模型的 LMDP 控制器表现出最佳性能。结果证明了即使使用简单的线性模型进行动力学学习，LMDP 框架在真实机器人控制中的有用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/a01a770c5511/fnbot-07-00007-g0001.jpg

相似文献

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.

Front Neurorobot. 2013 Apr 5;7:7. doi: 10.3389/fnbot.2013.00007. eCollection 2013.

Optimized Assistive Human-Robot Interaction Using Reinforcement Learning.

IEEE Trans Cybern. 2016 Mar;46(3):655-67. doi: 10.1109/TCYB.2015.2412554. Epub 2015 Mar 24.

Configuration-Dependent Optimal Impedance Control of an Upper Extremity Stroke Rehabilitation Manipulandum.

Front Robot AI. 2018 Nov 1;5:124. doi: 10.3389/frobt.2018.00124. eCollection 2018.

Image-based robot navigation with task achievability.

Front Robot AI. 2023 May 31;10:944375. doi: 10.3389/frobt.2023.944375. eCollection 2023.

Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning.

Sensors (Basel). 2020 Aug 10;20(16):4468. doi: 10.3390/s20164468.

Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem.

IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1523-1536. doi: 10.1109/TNNLS.2018.2870075. Epub 2018 Oct 8.

A Spring Compensation Method for a Low-Cost Biped Robot Based on Whole Body Control.

Biomimetics (Basel). 2023 Mar 21;8(1):126. doi: 10.3390/biomimetics8010126.

Modular deep reinforcement learning from reward and punishment for robot navigation.

Neural Netw. 2021 Mar;135:115-126. doi: 10.1016/j.neunet.2020.12.001. Epub 2020 Dec 8.

Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback.

IEEE Trans Cybern. 2019 Jan 3. doi: 10.1109/TCYB.2018.2886735.

Reinforcement learning in continuous time and space.

Neural Comput. 2000 Jan;12(1):219-45. doi: 10.1162/089976600300015961.

引用本文的文献

F-18 FDG PET/CT based Preoperative Machine Learning Prediction Models for Evaluating Regional Lymph Node Metastasis Status of Patients with Colon Cancer.

Asian Pac J Cancer Prev. 2025 Jan 1;26(1):85-90. doi: 10.31557/APJCP.2025.26.1.85.

Generative models for sequential dynamics in active inference.

Cogn Neurodyn. 2024 Dec;18(6):3259-3272. doi: 10.1007/s11571-023-09963-x. Epub 2023 Apr 26.

Inferring What to Do (And What Not to).

Entropy (Basel). 2020 May 11;22(5):536. doi: 10.3390/e22050536.

Value and reward based learning in neurorobots.

Front Neurorobot. 2013 Sep 13;7:13. doi: 10.3389/fnbot.2013.00013. eCollection 2013.

本文引用的文献

The ubiquity of model-based reinforcement learning.

Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.

Model learning for robot control: a survey.

Cogn Process. 2011 Nov;12(4):319-40. doi: 10.1007/s10339-011-0404-1. Epub 2011 Apr 13.

Model-based influences on humans' choices and striatal prediction errors.

Neuron. 2011 Mar 24;69(6):1204-15. doi: 10.1016/j.neuron.2011.02.027.

How can we learn efficiently to act optimally and flexibly?

Proc Natl Acad Sci U S A. 2009 Jul 14;106(28):11429-30. doi: 10.1073/pnas.0905423106. Epub 2009 Jul 7.

Efficient computation of optimal actions.

Proc Natl Acad Sci U S A. 2009 Jul 14;106(28):11478-83. doi: 10.1073/pnas.0710743106. Epub 2009 Jul 2.

Linear theory for control of nonlinear stochastic systems.

Phys Rev Lett. 2005 Nov 11;95(20):200201. doi: 10.1103/PhysRevLett.95.200201. Epub 2005 Nov 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在移动机器人导航任务中，具有动态模型学习的线性可解马尔可夫决策过程的评估。

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献