Suppr超能文献

基于学习的控制策略与具有非对称信息结构的在线二次优化的遗憾分析。

Learning-Based Control Policy and Regret Analysis for Online Quadratic Optimization With Asymmetric Information Structure.

出版信息

IEEE Trans Cybern. 2022 Jun;52(6):4797-4810. doi: 10.1109/TCYB.2021.3049357. Epub 2022 Jun 16.

Abstract

In this article, we propose a learning approach to analyze dynamic systems with an asymmetric information structure. Instead of adopting a game-theoretic setting, we investigate an online quadratic optimization problem driven by system noises with unknown statistics. Due to information asymmetry, it is infeasible to use the classic Kalman filter nor optimal control strategies for such systems. It is necessary and beneficial to develop an admissible approach that learns the probability statistics as time goes forward. Motivated by the online convex optimization (OCO) theory, we introduce the notion of regret, which is defined as the cumulative performance loss difference between the optimal offline-known statistics cost and the optimal online-unknown statistics cost. By utilizing dynamic programming and linear minimum mean square biased estimate (LMMSUE), we propose a new type of online state-feedback control policy and characterize the behavior of regret in a finite-time regime. The regret is shown to be sublinear and bounded by O(lnT) . Moreover, we address an online optimization problem with output-feedback control policy and propose a heuristic online control policy.

摘要

在本文中,我们提出了一种学习方法来分析具有不对称信息结构的动态系统。我们没有采用博弈论的设置,而是研究了由系统噪声驱动的在线二次优化问题,这些噪声的统计信息是未知的。由于信息不对称,使用经典的卡尔曼滤波器或最优控制策略来处理此类系统是不可行的。有必要并有益的是开发一种可接受的方法,以便随着时间的推移学习概率统计信息。受在线凸优化 (OCO) 理论的启发,我们引入了后悔的概念,它被定义为最优离线已知统计成本与最优在线未知统计成本之间的累积性能损失差。通过利用动态规划和线性最小均方偏置估计 (LMMSUE),我们提出了一种新的在线状态反馈控制策略,并在有限时间内刻画了后悔的行为。结果表明,后悔是次线性的,并被 O(lnT) 所限制。此外,我们解决了具有输出反馈控制策略的在线优化问题,并提出了一种启发式在线控制策略。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验