动作映射：一种用于约束输入系统的强化学习方法。

Action Mapping: A Reinforcement Learning Method for Constrained-Input Systems.

作者信息

Yuan Xin, Wang Yuanda, Liu Jian, Sun Changyin

出版信息

IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7145-7157. doi: 10.1109/TNNLS.2021.3138924. Epub 2023 Oct 5.

DOI:10.1109/TNNLS.2021.3138924

Abstract

Existing approaches to constrained-input optimal control problems mainly focus on systems with input saturation, whereas other constraints, such as combined inequality constraints and state-dependent constraints, are seldom discussed. In this article, a reinforcement learning (RL)-based algorithm is developed for constrained-input optimal control of discrete-time (DT) systems. The deterministic policy gradient (DPG) is introduced to iteratively search the optimal solution to the Hamilton-Jacobi-Bellman (HJB) equation. To deal with input constraints, an action mapping (AM) mechanism is proposed. The objective of this mechanism is to transform the exploration space from the subspace generated by the given inequality constraints to the standard Cartesian product space, which can be searched effectively by existing algorithms. By using the proposed architecture, the learned policy can output control signals satisfying the given constraints, and the original reward function can be kept unchanged. In our study, the convergence analysis is given. It is shown that the iterative algorithm is convergent to the optimal solution of the HJB equation. In addition, the continuity of the iterative estimated Q -function is investigated. Two numerical examples are provided to demonstrate the effectiveness of our approach.

摘要

现有的约束输入最优控制问题的方法主要集中在具有输入饱和的系统上，而其他约束，如组合不等式约束和状态依赖约束，则很少被讨论。在本文中，针对离散时间（DT）系统的约束输入最优控制，开发了一种基于强化学习（RL）的算法。引入确定性策略梯度（DPG）来迭代搜索哈密顿-雅可比-贝尔曼（HJB）方程的最优解。为了处理输入约束，提出了一种动作映射（AM）机制。该机制的目的是将探索空间从由给定不等式约束生成的子空间转换到标准笛卡尔积空间，现有算法可以有效地对其进行搜索。通过使用所提出的架构，学习到的策略可以输出满足给定约束的控制信号，并且原始奖励函数可以保持不变。在我们的研究中，给出了收敛性分析。结果表明，迭代算法收敛于HJB方程的最优解。此外，还研究了迭代估计Q函数的连续性。提供了两个数值例子来证明我们方法的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

动作映射：一种用于约束输入系统的强化学习方法。

Action Mapping: A Reinforcement Learning Method for Constrained-Input Systems.

作者信息

出版信息

相似文献

动作映射：一种用于约束输入系统的强化学习方法。

Action Mapping: A Reinforcement Learning Method for Constrained-Input Systems.

作者信息

出版信息

相似文献