基于核的最小二乘时间差分与梯度校正。

Kernel-Based Least Squares Temporal Difference With Gradient Correction.

出版信息

IEEE Trans Neural Netw Learn Syst. 2016 Apr;27(4):771-82. doi: 10.1109/TNNLS.2015.2424233. Epub 2015 May 1.

DOI:10.1109/TNNLS.2015.2424233

Abstract

A least squares temporal difference with gradient correction (LS-TDC) algorithm and its kernel-based version kernel-based LS-TDC (KLS-TDC) are proposed as policy evaluation algorithms for reinforcement learning (RL). LS-TDC is derived from the TDC algorithm. Attributed to TDC derived by minimizing the mean-square projected Bellman error, LS-TDC has better convergence performance. The least squares technique is used to omit the size-step tuning of the original TDC and enhance robustness. For KLS-TDC, since the kernel method is used, feature vectors can be selected automatically. The approximate linear dependence analysis is performed to realize kernel sparsification. In addition, a policy iteration strategy motivated by KLS-TDC is constructed to solve control learning problems. The convergence and parameter sensitivities of both LS-TDC and KLS-TDC are tested through on-policy learning, off-policy learning, and control learning problems. Experimental results, as compared with a series of corresponding RL algorithms, demonstrate that both LS-TDC and KLS-TDC have better approximation and convergence performance, higher efficiency for sample usage, smaller burden of parameter tuning, and less sensitivity to parameters.

摘要

提出了最小二乘时间差分与梯度校正（LS-TDC）算法及其基于核的版本核最小二乘时间差分（KLS-TDC）作为强化学习（RL）的策略评估算法。LS-TDC 是从 TDC 算法中推导出的。由于 TDC 通过最小化均方投影贝尔曼误差来导出，LS-TDC 具有更好的收敛性能。最小二乘法用于省略原始 TDC 的大小步长调整，从而提高鲁棒性。对于 KLS-TDC，由于使用了核方法，因此可以自动选择特征向量。通过进行近似线性相关性分析来实现核稀疏化。此外，基于 KLS-TDC 构建了一种策略迭代策略，以解决控制学习问题。通过在线策略学习、离线策略学习和控制学习问题来测试 LS-TDC 和 KLS-TDC 的收敛性和参数敏感性。与一系列相应的 RL 算法进行的实验结果表明，LS-TDC 和 KLS-TDC 都具有更好的逼近和收敛性能，对样本使用效率更高，参数调整负担更小，对参数的敏感性更低。

相似文献

Kernel-Based Least Squares Temporal Difference With Gradient Correction.基于核的最小二乘时间差分与梯度校正。

IEEE Trans Neural Netw Learn Syst. 2016 Apr;27(4):771-82. doi: 10.1109/TNNLS.2015.2424233. Epub 2015 May 1.

Kernel-based least squares policy iteration for reinforcement learning.用于强化学习的基于核的最小二乘策略迭代

IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.

Online selective kernel-based temporal difference learning.在线选择性核时变差分学习。

IEEE Trans Neural Netw Learn Syst. 2013 Dec;24(12):1944-56. doi: 10.1109/TNNLS.2013.2270561.

Recursive Least-Squares Temporal Difference With Gradient Correction.

IEEE Trans Cybern. 2021 Aug;51(8):4251-4264. doi: 10.1109/TCYB.2019.2902342. Epub 2021 Aug 4.

Hierarchical approximate policy iteration with binary-tree state space decomposition.基于二叉树状态空间分解的分层近似策略迭代

IEEE Trans Neural Netw. 2011 Dec;22(12):1863-77. doi: 10.1109/TNN.2011.2168422. Epub 2011 Oct 10.

Manifold-Based Reinforcement Learning via Locally Linear Reconstruction.基于流形的局部线性重构强化学习。

IEEE Trans Neural Netw Learn Syst. 2017 Apr;28(4):934-947. doi: 10.1109/TNNLS.2015.2505084. Epub 2016 Jan 27.

Actor-Critic Learning Control Based on -Regularized Temporal-Difference Prediction With Gradient Correction.基于带梯度校正的正则化时间差分预测的演员-评论家学习控制

IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5899-5909. doi: 10.1109/TNNLS.2018.2808203. Epub 2018 Apr 5.

A linear recurrent kernel online learning algorithm with sparse updates.一种具有稀疏更新的线性递归核在线学习算法。

Neural Netw. 2014 Feb;50:142-53. doi: 10.1016/j.neunet.2013.11.011. Epub 2013 Nov 20.

Design of a multiple kernel learning algorithm for LS-SVM by convex programming.基于凸规划的 LS-SVM 多核学习算法设计。

Neural Netw. 2011 Jun;24(5):476-83. doi: 10.1016/j.neunet.2011.03.009. Epub 2011 Mar 12.

A fast algorithm for AR parameter estimation using a novel noise-constrained least-squares method.一种使用新型噪声约束最小二乘法的 AR 参数估计快速算法。

Neural Netw. 2010 Apr;23(3):396-405. doi: 10.1016/j.neunet.2009.11.004. Epub 2009 Dec 11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于核的最小二乘时间差分与梯度校正。

Kernel-Based Least Squares Temporal Difference With Gradient Correction.

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献