• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于能量的连续逆最优控制

Energy-Based Continuous Inverse Optimal Control.

作者信息

Xu Yifei, Xie Jianwen, Zhao Tianyang, Baker Chris, Zhao Yibiao, Wu Ying Nian

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10563-10577. doi: 10.1109/TNNLS.2022.3168795. Epub 2023 Nov 30.

DOI:10.1109/TNNLS.2022.3168795
PMID:35511835
Abstract

The problem of continuous inverse optimal control (over finite time horizon) is to learn the unknown cost function over the sequence of continuous control variables from expert demonstrations. In this article, we study this fundamental problem in the framework of energy-based model (EBM), where the observed expert trajectories are assumed to be random samples from a probability density function defined as the exponential of the negative cost function up to a normalizing constant. The parameters of the cost function are learned by maximum likelihood via an "analysis by synthesis" scheme, which iterates: 1) synthesis step: sample the synthesized trajectories from the current probability density using the Langevin dynamics via backpropagation through time and 2) analysis step: update the model parameters based on the statistical difference between the synthesized trajectories and the observed trajectories. Given the fact that an efficient optimization algorithm is usually available for an optimal control problem, we also consider a convenient approximation of the above learning method, where we replace the sampling in the synthesis step by optimization. Moreover, to make the sampling or optimization more efficient, we propose to train the EBM simultaneously with a top-down trajectory generator via cooperative learning, where the trajectory generator is used to fast initialize the synthesis step of the EBM. We demonstrate the proposed methods on autonomous driving tasks and show that they can learn suitable cost functions for optimal control.

摘要

连续逆最优控制问题(在有限时间范围内)是要从专家演示中学习连续控制变量序列上的未知成本函数。在本文中,我们在基于能量的模型(EBM)框架下研究这个基本问题,其中假设观察到的专家轨迹是来自一个概率密度函数的随机样本,该概率密度函数被定义为负成本函数的指数再加上一个归一化常数。成本函数的参数通过一种“综合分析”方案以最大似然法学习,该方案迭代如下:1)综合步骤:通过时间反向传播使用朗之万动力学从当前概率密度中采样合成轨迹;2)分析步骤:根据合成轨迹和观察到的轨迹之间的统计差异更新模型参数。鉴于通常可以为最优控制问题获得高效的优化算法,我们还考虑了上述学习方法的一种便捷近似,即我们用优化来代替合成步骤中的采样。此外,为了使采样或优化更高效,我们建议通过协作学习将EBM与自上而下的轨迹生成器同时训练,其中轨迹生成器用于快速初始化EBM的合成步骤。我们在自动驾驶任务上演示了所提出的方法,并表明它们可以学习适合最优控制的成本函数。

相似文献

1
Energy-Based Continuous Inverse Optimal Control.基于能量的连续逆最优控制
IEEE Trans Neural Netw Learn Syst. 2023 Dec;34(12):10563-10577. doi: 10.1109/TNNLS.2022.3168795. Epub 2023 Nov 30.
2
Cooperative Training of Descriptor and Generator Networks.描述符网络与生成器网络的协同训练
IEEE Trans Pattern Anal Mach Intell. 2020 Jan;42(1):27-45. doi: 10.1109/TPAMI.2018.2879081. Epub 2018 Nov 1.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Online Inverse Optimal Control for Time-Varying Cost Weights.时变成本权重的在线逆最优控制
Biomimetics (Basel). 2024 Jan 31;9(2):84. doi: 10.3390/biomimetics9020084.
5
Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems.基于专家模仿的离散时间系统逆强化Q学习
IEEE Trans Neural Netw Learn Syst. 2023 May;34(5):2386-2399. doi: 10.1109/TNNLS.2021.3106635. Epub 2023 May 2.
6
Particle swarm optimization for discrete-time inverse optimal control of a doubly fed induction generator.粒子群算法在双馈感应发电机离散时间逆最优控制中的应用。
IEEE Trans Cybern. 2013 Dec;43(6):1698-709. doi: 10.1109/TSMCB.2012.2228188.
7
Hierarchical Adversarial Inverse Reinforcement Learning.分层对抗逆强化学习
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):17549-17558. doi: 10.1109/TNNLS.2023.3305983. Epub 2024 Dec 2.
8
Optimal Trajectory Planning for Wheeled Mobile Robots under Localization Uncertainty and Energy Efficiency Constraints.定位不确定性和能源效率约束下轮式移动机器人的最优轨迹规划
Sensors (Basel). 2021 Jan 6;21(2):335. doi: 10.3390/s21020335.
9
Finite-Horizon $H_{\infty }$ Tracking Control for Unknown Nonlinear Systems With Saturating Actuators.有限时域 $H_{\infty }$ 跟踪控制用于具有饱和执行器的未知非线性系统。
IEEE Trans Neural Netw Learn Syst. 2018 Apr;29(4):1200-1212. doi: 10.1109/TNNLS.2017.2669099. Epub 2017 Feb 28.
10
Inverse Reinforcement Learning for Adversarial Apprentice Games.对抗学徒游戏的逆强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4596-4609. doi: 10.1109/TNNLS.2021.3114612. Epub 2023 Aug 4.