• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在移动机器人导航任务中,具有动态模型学习的线性可解马尔可夫决策过程的评估。

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.

机构信息

Neural Computation Laboratory, Graduate School of Information Science, Nara Institute of Science and Technology Ikoma, Nara, Japan ; Neural Computation Unit, Okinawa Institute of Science and Technology Onna-son, Okinawa, Japan.

出版信息

Front Neurorobot. 2013 Apr 5;7:7. doi: 10.3389/fnbot.2013.00007. eCollection 2013.

DOI:10.3389/fnbot.2013.00007
PMID:23576983
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3617398/
Abstract

Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in which the Bellman's equation can be converted into a linear equation by an exponential transformation of the state value function (Todorov, 2009b). In an LMDP, the optimal value function and the corresponding control policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunction problem in a continuous state using the knowledge of the system dynamics and the action, state, and terminal cost functions. In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in which the dynamics of the body and the environment have to be learned from experience. We first perform a simulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynamics model on the derived the action policy. The result shows that a crude linear approximation of the non-linear dynamics can still allow solution of the task, despite with a higher total cost. We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robot platform. The state is given by the position and the size of a battery in its camera view and two neck joint angles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servo controller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state cost functions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics model performed equivalently with the optimal linear quadratic regulator (LQR). In the non-quadratic task, the LMDP controller with a linear dynamics model showed the best performance. The results demonstrate the usefulness of the LMDP framework in real robot control even when simple linear models are used for dynamics learning.

摘要

线性可解马尔可夫决策过程(LMDP)是一类最优控制问题,其中贝尔曼方程可以通过状态值函数的指数变换转换为线性方程(Todorov,2009b)。在 LMDP 中,最优值函数和相应的控制策略是通过在离散状态空间中求解特征值问题或在连续状态空间中求解特征函数问题来获得的,使用系统动力学和动作、状态和终端成本函数的知识。在这项研究中,我们评估了 LMDP 框架在真实机器人控制中的有效性,其中必须从经验中学习身体和环境的动力学。我们首先进行了一个杆摆起任务的模拟研究,以评估所学习的动力学模型的准确性对导出的动作策略的影响。结果表明,即使总代价较高,非线性动力学的粗糙线性近似仍可允许解决任务。然后,我们使用我们的 Spring Dog 移动机器人平台进行了电池捕获任务的真实机器人实验。状态由电池在其相机视图中的位置和大小以及两个颈部关节角度给出。动作是两个轮子的速度,而颈部关节由视觉伺服控制器控制。我们在具有二次和高斯状态成本函数的任务中测试了线性和双线性动力学模型。在二次成本任务中,从学习的线性动力学模型导出的 LMDP 控制器与最优线性二次调节器(LQR)表现相当。在非二次任务中,具有线性动力学模型的 LMDP 控制器表现出最佳性能。结果证明了即使使用简单的线性模型进行动力学学习,LMDP 框架在真实机器人控制中的有用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/27ee9784d3cf/fnbot-07-00007-g0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/a01a770c5511/fnbot-07-00007-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/a40e77af34d1/fnbot-07-00007-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/2b0a091eb157/fnbot-07-00007-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/86ffee014176/fnbot-07-00007-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/9456a199e0cd/fnbot-07-00007-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/b67754d71136/fnbot-07-00007-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/cf0c741478f9/fnbot-07-00007-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/b0545762b670/fnbot-07-00007-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/37cbfa20fa13/fnbot-07-00007-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/bf6ae54268fe/fnbot-07-00007-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/3bbbfbb6ddb7/fnbot-07-00007-g0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/f9923532e1a1/fnbot-07-00007-g0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/27ee9784d3cf/fnbot-07-00007-g0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/a01a770c5511/fnbot-07-00007-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/a40e77af34d1/fnbot-07-00007-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/2b0a091eb157/fnbot-07-00007-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/86ffee014176/fnbot-07-00007-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/9456a199e0cd/fnbot-07-00007-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/b67754d71136/fnbot-07-00007-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/cf0c741478f9/fnbot-07-00007-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/b0545762b670/fnbot-07-00007-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/37cbfa20fa13/fnbot-07-00007-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/bf6ae54268fe/fnbot-07-00007-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/3bbbfbb6ddb7/fnbot-07-00007-g0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/f9923532e1a1/fnbot-07-00007-g0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4d3/3617398/27ee9784d3cf/fnbot-07-00007-g0013.jpg

相似文献

1
Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task.在移动机器人导航任务中,具有动态模型学习的线性可解马尔可夫决策过程的评估。
Front Neurorobot. 2013 Apr 5;7:7. doi: 10.3389/fnbot.2013.00007. eCollection 2013.
2
Optimized Assistive Human-Robot Interaction Using Reinforcement Learning.使用强化学习优化人机辅助交互。
IEEE Trans Cybern. 2016 Mar;46(3):655-67. doi: 10.1109/TCYB.2015.2412554. Epub 2015 Mar 24.
3
Configuration-Dependent Optimal Impedance Control of an Upper Extremity Stroke Rehabilitation Manipulandum.上肢中风康复操作器的构型相关最优阻抗控制
Front Robot AI. 2018 Nov 1;5:124. doi: 10.3389/frobt.2018.00124. eCollection 2018.
4
Image-based robot navigation with task achievability.
Front Robot AI. 2023 May 31;10:944375. doi: 10.3389/frobt.2023.944375. eCollection 2023.
5
Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning.基于混合强化学习的动态平台双足机器人的稳定性控制。
Sensors (Basel). 2020 Aug 10;20(16):4468. doi: 10.3390/s20164468.
6
Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem.离散时间线性二次调节器问题的输出反馈Q学习控制
IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1523-1536. doi: 10.1109/TNNLS.2018.2870075. Epub 2018 Oct 8.
7
A Spring Compensation Method for a Low-Cost Biped Robot Based on Whole Body Control.一种基于全身控制的低成本双足机器人弹簧补偿方法。
Biomimetics (Basel). 2023 Mar 21;8(1):126. doi: 10.3390/biomimetics8010126.
8
Modular deep reinforcement learning from reward and punishment for robot navigation.基于奖惩的机器人导航模块化深度强化学习。
Neural Netw. 2021 Mar;135:115-126. doi: 10.1016/j.neunet.2020.12.001. Epub 2020 Dec 8.
9
Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback.基于强化学习的连续时间系统动态输出反馈线性二次调节
IEEE Trans Cybern. 2019 Jan 3. doi: 10.1109/TCYB.2018.2886735.
10
Reinforcement learning in continuous time and space.连续时间与空间中的强化学习。
Neural Comput. 2000 Jan;12(1):219-45. doi: 10.1162/089976600300015961.

引用本文的文献

1
F-18 FDG PET/CT based Preoperative Machine Learning Prediction Models for Evaluating Regional Lymph Node Metastasis Status of Patients with Colon Cancer.基于F-18 FDG PET/CT的术前机器学习预测模型用于评估结肠癌患者区域淋巴结转移状态
Asian Pac J Cancer Prev. 2025 Jan 1;26(1):85-90. doi: 10.31557/APJCP.2025.26.1.85.
2
Generative models for sequential dynamics in active inference.主动推理中序列动力学的生成模型。
Cogn Neurodyn. 2024 Dec;18(6):3259-3272. doi: 10.1007/s11571-023-09963-x. Epub 2023 Apr 26.
3
Inferring What to Do (And What Not to).

本文引用的文献

1
The ubiquity of model-based reinforcement learning.基于模型的强化学习无处不在。
Curr Opin Neurobiol. 2012 Dec;22(6):1075-81. doi: 10.1016/j.conb.2012.08.003. Epub 2012 Sep 6.
2
Model learning for robot control: a survey.用于机器人控制的模型学习:一项综述。
Cogn Process. 2011 Nov;12(4):319-40. doi: 10.1007/s10339-011-0404-1. Epub 2011 Apr 13.
3
Model-based influences on humans' choices and striatal prediction errors.基于模型的影响对人类选择和纹状体预测误差的影响。
推断该做什么(以及不该做什么)。
Entropy (Basel). 2020 May 11;22(5):536. doi: 10.3390/e22050536.
4
Value and reward based learning in neurorobots.神经机器人中基于价值和奖励的学习
Front Neurorobot. 2013 Sep 13;7:13. doi: 10.3389/fnbot.2013.00013. eCollection 2013.
Neuron. 2011 Mar 24;69(6):1204-15. doi: 10.1016/j.neuron.2011.02.027.
4
How can we learn efficiently to act optimally and flexibly?我们如何才能高效学习,从而实现最优且灵活的行动?
Proc Natl Acad Sci U S A. 2009 Jul 14;106(28):11429-30. doi: 10.1073/pnas.0905423106. Epub 2009 Jul 7.
5
Efficient computation of optimal actions.最优动作的高效计算。
Proc Natl Acad Sci U S A. 2009 Jul 14;106(28):11478-83. doi: 10.1073/pnas.0710743106. Epub 2009 Jul 2.
6
Linear theory for control of nonlinear stochastic systems.非线性随机系统控制的线性理论
Phys Rev Lett. 2005 Nov 11;95(20):200201. doi: 10.1103/PhysRevLett.95.200201. Epub 2005 Nov 7.