• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

鲁棒逆 Q 学习在对抗环境下的连续时间线性系统中的应用。

Robust Inverse Q-Learning for Continuous-Time Linear Systems in Adversarial Environments.

出版信息

IEEE Trans Cybern. 2022 Dec;52(12):13083-13095. doi: 10.1109/TCYB.2021.3100749. Epub 2022 Nov 18.

DOI:10.1109/TCYB.2021.3100749
PMID:34403352
Abstract

This article proposes robust inverse Q -learning algorithms for a learner to mimic an expert's states and control inputs in the imitation learning problem. These two agents have different adversarial disturbances. To do the imitation, the learner must reconstruct the unknown expert cost function. The learner only observes the expert's control inputs and uses inverse Q -learning algorithms to reconstruct the unknown expert cost function. The inverse Q -learning algorithms are robust in that they are independent of the system model and allow for the different cost function parameters and disturbances between two agents. We first propose an offline inverse Q -learning algorithm which consists of two iterative learning loops: 1) an inner Q -learning iteration loop and 2) an outer iteration loop based on inverse optimal control. Then, based on this offline algorithm, we further develop an online inverse Q -learning algorithm such that the learner mimics the expert behaviors online with the real-time observation of the expert control inputs. This online computational method has four functional approximators: a critic approximator, two actor approximators, and a state-reward neural network (NN). It simultaneously approximates the parameters of Q -function and the learner state reward online. Convergence and stability proofs are rigorously studied to guarantee the algorithm performance.

摘要

本文提出了鲁棒逆 Q 学习算法,使学习者能够在模仿学习问题中模仿专家的状态和控制输入。这两个代理具有不同的对抗性干扰。为了进行模仿,学习者必须重建未知专家的成本函数。学习者只观察专家的控制输入,并使用逆 Q 学习算法来重建未知专家的成本函数。逆 Q 学习算法具有鲁棒性,因为它们独立于系统模型,并允许两个代理之间的成本函数参数和干扰不同。我们首先提出了一种离线逆 Q 学习算法,它由两个迭代学习循环组成:1)内部 Q 学习迭代循环和 2)基于逆最优控制的外部迭代循环。然后,基于此离线算法,我们进一步开发了一种在线逆 Q 学习算法,使学习者能够在线实时观察专家的控制输入,模仿专家的行为。这种在线计算方法有四个功能逼近器:一个评价器逼近器、两个动作逼近器和一个状态奖励神经网络(NN)。它同时在线近似 Q 函数和学习者状态奖励的参数。严格研究了收敛性和稳定性证明,以保证算法性能。

相似文献

1
Robust Inverse Q-Learning for Continuous-Time Linear Systems in Adversarial Environments.鲁棒逆 Q 学习在对抗环境下的连续时间线性系统中的应用。
IEEE Trans Cybern. 2022 Dec;52(12):13083-13095. doi: 10.1109/TCYB.2021.3100749. Epub 2022 Nov 18.
2
Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems.基于专家模仿的离散时间系统逆强化Q学习
IEEE Trans Neural Netw Learn Syst. 2023 May;34(5):2386-2399. doi: 10.1109/TNNLS.2021.3106635. Epub 2023 May 2.
3
Inverse Reinforcement Learning for Adversarial Apprentice Games.对抗学徒游戏的逆强化学习
IEEE Trans Neural Netw Learn Syst. 2023 Aug;34(8):4596-4609. doi: 10.1109/TNNLS.2021.3114612. Epub 2023 Aug 4.
4
Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games.线性多人游戏的数据驱动逆强化学习控制
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):2028-2041. doi: 10.1109/TNNLS.2022.3186229. Epub 2024 Feb 5.
5
Inverse Value Iteration and Q-Learning: Algorithms, Stability, and Robustness.逆值迭代与Q学习:算法、稳定性及鲁棒性
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6970-6980. doi: 10.1109/TNNLS.2024.3409182. Epub 2025 Apr 4.
6
Inverse Reinforcement Learning for Trajectory Imitation Using Static Output Feedback Control.基于静态输出反馈控制的轨迹模仿逆强化学习
IEEE Trans Cybern. 2024 Mar;54(3):1695-1707. doi: 10.1109/TCYB.2023.3241015. Epub 2024 Feb 9.
7
Quantum Imitation Learning.量子模仿学习
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14190-14204. doi: 10.1109/TNNLS.2023.3275075. Epub 2024 Oct 7.
8
Neural Q-learning for discrete-time nonlinear zero-sum games with adjustable convergence rate.具有可调收敛速度的离散时间非线性零和博弈的神经 Q 学习。
Neural Netw. 2024 Jul;175:106274. doi: 10.1016/j.neunet.2024.106274. Epub 2024 Mar 27.
9
Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks.基于策略迭代和神经网络的未知约束输入系统自适应最优控制。
IEEE Trans Neural Netw Learn Syst. 2013 Oct;24(10):1513-25. doi: 10.1109/TNNLS.2013.2276571.
10
Optimal Tracking Control of a Nonlinear Multiagent System Using Q-Learning via Event-Triggered Reinforcement Learning.基于事件触发强化学习的Q学习对非线性多智能体系统的最优跟踪控制
Entropy (Basel). 2023 Feb 5;25(2):299. doi: 10.3390/e25020299.