• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于学习的控制策略与具有非对称信息结构的在线二次优化的遗憾分析。

Learning-Based Control Policy and Regret Analysis for Online Quadratic Optimization With Asymmetric Information Structure.

出版信息

IEEE Trans Cybern. 2022 Jun;52(6):4797-4810. doi: 10.1109/TCYB.2021.3049357. Epub 2022 Jun 16.

DOI:10.1109/TCYB.2021.3049357
PMID:33502987
Abstract

In this article, we propose a learning approach to analyze dynamic systems with an asymmetric information structure. Instead of adopting a game-theoretic setting, we investigate an online quadratic optimization problem driven by system noises with unknown statistics. Due to information asymmetry, it is infeasible to use the classic Kalman filter nor optimal control strategies for such systems. It is necessary and beneficial to develop an admissible approach that learns the probability statistics as time goes forward. Motivated by the online convex optimization (OCO) theory, we introduce the notion of regret, which is defined as the cumulative performance loss difference between the optimal offline-known statistics cost and the optimal online-unknown statistics cost. By utilizing dynamic programming and linear minimum mean square biased estimate (LMMSUE), we propose a new type of online state-feedback control policy and characterize the behavior of regret in a finite-time regime. The regret is shown to be sublinear and bounded by O(lnT) . Moreover, we address an online optimization problem with output-feedback control policy and propose a heuristic online control policy.

摘要

在本文中,我们提出了一种学习方法来分析具有不对称信息结构的动态系统。我们没有采用博弈论的设置,而是研究了由系统噪声驱动的在线二次优化问题,这些噪声的统计信息是未知的。由于信息不对称,使用经典的卡尔曼滤波器或最优控制策略来处理此类系统是不可行的。有必要并有益的是开发一种可接受的方法,以便随着时间的推移学习概率统计信息。受在线凸优化 (OCO) 理论的启发,我们引入了后悔的概念,它被定义为最优离线已知统计成本与最优在线未知统计成本之间的累积性能损失差。通过利用动态规划和线性最小均方偏置估计 (LMMSUE),我们提出了一种新的在线状态反馈控制策略,并在有限时间内刻画了后悔的行为。结果表明,后悔是次线性的,并被 O(lnT) 所限制。此外,我们解决了具有输出反馈控制策略的在线优化问题,并提出了一种启发式在线控制策略。

相似文献

1
Learning-Based Control Policy and Regret Analysis for Online Quadratic Optimization With Asymmetric Information Structure.基于学习的控制策略与具有非对称信息结构的在线二次优化的遗憾分析。
IEEE Trans Cybern. 2022 Jun;52(6):4797-4810. doi: 10.1109/TCYB.2021.3049357. Epub 2022 Jun 16.
2
Push-Sum Distributed Online Optimization With Bandit Feedback.具有博弈反馈的推和分布式在线优化
IEEE Trans Cybern. 2022 Apr;52(4):2263-2273. doi: 10.1109/TCYB.2020.2999309. Epub 2022 Apr 5.
3
Online Learning Algorithm for Distributed Convex Optimization With Time-Varying Coupled Constraints and Bandit Feedback.具有时变耦合约束和博弈反馈的分布式凸优化在线学习算法
IEEE Trans Cybern. 2022 Feb;52(2):1009-1020. doi: 10.1109/TCYB.2020.2990796. Epub 2022 Feb 16.
4
Distributed Online Constrained Optimization With Feedback Delays.具有反馈延迟的分布式在线约束优化
IEEE Trans Neural Netw Learn Syst. 2024 Feb;35(2):1708-1720. doi: 10.1109/TNNLS.2022.3184957. Epub 2024 Feb 5.
5
Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem.离散时间线性二次调节器问题的输出反馈Q学习控制
IEEE Trans Neural Netw Learn Syst. 2019 May;30(5):1523-1536. doi: 10.1109/TNNLS.2018.2870075. Epub 2018 Oct 8.
6
Optimal Robust Output Containment of Unknown Heterogeneous Multiagent System Using Off-Policy Reinforcement Learning.基于非策略强化学习的未知异构多智能体系统最优鲁棒输出控制。
IEEE Trans Cybern. 2018 Nov;48(11):3197-3207. doi: 10.1109/TCYB.2017.2761878. Epub 2017 Oct 30.
7
Proximal Online Gradient Is Optimum for Dynamic Regret: A General Lower Bound.
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7755-7764. doi: 10.1109/TNNLS.2021.3087579. Epub 2022 Nov 30.
8
Discrete-time online learning control for a class of unknown nonaffine nonlinear systems using reinforcement learning.基于强化学习的一类未知非仿射非线性系统的离散时间在线学习控制。
Neural Netw. 2014 Jul;55:30-41. doi: 10.1016/j.neunet.2014.03.008. Epub 2014 Mar 28.
9
Optimization Landscape of Policy Gradient Methods for Discrete-Time Static Output Feedback.
IEEE Trans Cybern. 2024 Jun;54(6):3588-3601. doi: 10.1109/TCYB.2023.3323316. Epub 2024 May 30.
10
Distributed Online Stochastic-Constrained Convex Optimization With Bandit Feedback.具有博弈反馈的分布式在线随机约束凸优化
IEEE Trans Cybern. 2024 Jan;54(1):63-75. doi: 10.1109/TCYB.2022.3177644. Epub 2023 Dec 20.