• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Optimization Landscape of Policy Gradient Methods for Discrete-Time Static Output Feedback.

作者信息

Duan Jingliang, Li Jie, Chen Xuyang, Zhao Kai, Li Shengbo Eben, Zhao Lin

出版信息

IEEE Trans Cybern. 2024 Jun;54(6):3588-3601. doi: 10.1109/TCYB.2023.3323316. Epub 2024 May 30.

DOI:10.1109/TCYB.2023.3323316
PMID:37883283
Abstract

In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This article analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, L -smoothness, and M -Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss-Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence toward local minima when initialized near such minima. This article concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning.

摘要

相似文献

1
Optimization Landscape of Policy Gradient Methods for Discrete-Time Static Output Feedback.
IEEE Trans Cybern. 2024 Jun;54(6):3588-3601. doi: 10.1109/TCYB.2023.3323316. Epub 2024 May 30.
2
Optimal Learning Output Tracking Control: A Model-Free Policy Optimization Method With Convergence Analysis.最优学习输出跟踪控制:一种具有收敛性分析的无模型策略优化方法。
IEEE Trans Neural Netw Learn Syst. 2025 Mar;36(3):5574-5585. doi: 10.1109/TNNLS.2024.3379207. Epub 2025 Feb 28.
3
Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback.基于强化学习的连续时间系统动态输出反馈线性二次调节
IEEE Trans Cybern. 2019 Jan 3. doi: 10.1109/TCYB.2018.2886735.
4
Geometry and convergence of natural policy gradient methods.自然策略梯度方法的几何结构与收敛性
Inf Geom. 2023;7(Suppl 1):485-523. doi: 10.1007/s41884-023-00106-z. Epub 2023 Jun 2.
5
Constrained Output-Feedback Control for Discrete-Time Fuzzy Systems With Local Nonlinear Models Subject to State and Input Constraints.具有局部非线性模型且受状态和输入约束的离散时间模糊系统的约束输出反馈控制
IEEE Trans Cybern. 2021 Sep;51(9):4673-4684. doi: 10.1109/TCYB.2020.3009128. Epub 2021 Sep 15.
6
Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval.随机初始化梯度下降法:非凸相位恢复的快速全局收敛性
Math Program. 2019 Jul;176(1-2):5-37. doi: 10.1007/s10107-019-01363-6. Epub 2019 Feb 4.
7
Optimal Output Regulation of Linear Discrete-Time Systems With Unknown Dynamics Using Reinforcement Learning.基于强化学习的未知动态线性离散时间系统的最优输出调节
IEEE Trans Cybern. 2020 Jul;50(7):3147-3156. doi: 10.1109/TCYB.2018.2890046. Epub 2019 Jan 25.
8
Learning-Based Control Policy and Regret Analysis for Online Quadratic Optimization With Asymmetric Information Structure.基于学习的控制策略与具有非对称信息结构的在线二次优化的遗憾分析。
IEEE Trans Cybern. 2022 Jun;52(6):4797-4810. doi: 10.1109/TCYB.2021.3049357. Epub 2022 Jun 16.
9
Data-Driven H Optimal Output Feedback Control for Linear Discrete-Time Systems Based on Off-Policy Q-Learning.基于离策略Q学习的线性离散时间系统数据驱动H最优输出反馈控制
IEEE Trans Neural Netw Learn Syst. 2023 Jul;34(7):3553-3567. doi: 10.1109/TNNLS.2021.3112457. Epub 2023 Jul 6.
10
RNN-K: A Reinforced Newton Method for Consensus-Based Distributed Optimization and Control Over Multiagent Systems.RNN-K:一种基于共识的多智能体系统分布式优化与控制的强化牛顿方法。
IEEE Trans Cybern. 2022 May;52(5):4012-4026. doi: 10.1109/TCYB.2020.3011819. Epub 2022 May 19.