• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于核的最小二乘时间差分与梯度校正。

Kernel-Based Least Squares Temporal Difference With Gradient Correction.

出版信息

IEEE Trans Neural Netw Learn Syst. 2016 Apr;27(4):771-82. doi: 10.1109/TNNLS.2015.2424233. Epub 2015 May 1.

DOI:10.1109/TNNLS.2015.2424233
PMID:25955853
Abstract

A least squares temporal difference with gradient correction (LS-TDC) algorithm and its kernel-based version kernel-based LS-TDC (KLS-TDC) are proposed as policy evaluation algorithms for reinforcement learning (RL). LS-TDC is derived from the TDC algorithm. Attributed to TDC derived by minimizing the mean-square projected Bellman error, LS-TDC has better convergence performance. The least squares technique is used to omit the size-step tuning of the original TDC and enhance robustness. For KLS-TDC, since the kernel method is used, feature vectors can be selected automatically. The approximate linear dependence analysis is performed to realize kernel sparsification. In addition, a policy iteration strategy motivated by KLS-TDC is constructed to solve control learning problems. The convergence and parameter sensitivities of both LS-TDC and KLS-TDC are tested through on-policy learning, off-policy learning, and control learning problems. Experimental results, as compared with a series of corresponding RL algorithms, demonstrate that both LS-TDC and KLS-TDC have better approximation and convergence performance, higher efficiency for sample usage, smaller burden of parameter tuning, and less sensitivity to parameters.

摘要

提出了最小二乘时间差分与梯度校正(LS-TDC)算法及其基于核的版本核最小二乘时间差分(KLS-TDC)作为强化学习(RL)的策略评估算法。LS-TDC 是从 TDC 算法中推导出的。由于 TDC 通过最小化均方投影贝尔曼误差来导出,LS-TDC 具有更好的收敛性能。最小二乘法用于省略原始 TDC 的大小步长调整,从而提高鲁棒性。对于 KLS-TDC,由于使用了核方法,因此可以自动选择特征向量。通过进行近似线性相关性分析来实现核稀疏化。此外,基于 KLS-TDC 构建了一种策略迭代策略,以解决控制学习问题。通过在线策略学习、离线策略学习和控制学习问题来测试 LS-TDC 和 KLS-TDC 的收敛性和参数敏感性。与一系列相应的 RL 算法进行的实验结果表明,LS-TDC 和 KLS-TDC 都具有更好的逼近和收敛性能,对样本使用效率更高,参数调整负担更小,对参数的敏感性更低。

相似文献

1
Kernel-Based Least Squares Temporal Difference With Gradient Correction.基于核的最小二乘时间差分与梯度校正。
IEEE Trans Neural Netw Learn Syst. 2016 Apr;27(4):771-82. doi: 10.1109/TNNLS.2015.2424233. Epub 2015 May 1.
2
Kernel-based least squares policy iteration for reinforcement learning.用于强化学习的基于核的最小二乘策略迭代
IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.
3
Online selective kernel-based temporal difference learning.在线选择性核时变差分学习。
IEEE Trans Neural Netw Learn Syst. 2013 Dec;24(12):1944-56. doi: 10.1109/TNNLS.2013.2270561.
4
Recursive Least-Squares Temporal Difference With Gradient Correction.
IEEE Trans Cybern. 2021 Aug;51(8):4251-4264. doi: 10.1109/TCYB.2019.2902342. Epub 2021 Aug 4.
5
Hierarchical approximate policy iteration with binary-tree state space decomposition.基于二叉树状态空间分解的分层近似策略迭代
IEEE Trans Neural Netw. 2011 Dec;22(12):1863-77. doi: 10.1109/TNN.2011.2168422. Epub 2011 Oct 10.
6
Manifold-Based Reinforcement Learning via Locally Linear Reconstruction.基于流形的局部线性重构强化学习。
IEEE Trans Neural Netw Learn Syst. 2017 Apr;28(4):934-947. doi: 10.1109/TNNLS.2015.2505084. Epub 2016 Jan 27.
7
Actor-Critic Learning Control Based on -Regularized Temporal-Difference Prediction With Gradient Correction.基于带梯度校正的正则化时间差分预测的演员-评论家学习控制
IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):5899-5909. doi: 10.1109/TNNLS.2018.2808203. Epub 2018 Apr 5.
8
A linear recurrent kernel online learning algorithm with sparse updates.一种具有稀疏更新的线性递归核在线学习算法。
Neural Netw. 2014 Feb;50:142-53. doi: 10.1016/j.neunet.2013.11.011. Epub 2013 Nov 20.
9
Design of a multiple kernel learning algorithm for LS-SVM by convex programming.基于凸规划的 LS-SVM 多核学习算法设计。
Neural Netw. 2011 Jun;24(5):476-83. doi: 10.1016/j.neunet.2011.03.009. Epub 2011 Mar 12.
10
A fast algorithm for AR parameter estimation using a novel noise-constrained least-squares method.一种使用新型噪声约束最小二乘法的 AR 参数估计快速算法。
Neural Netw. 2010 Apr;23(3):396-405. doi: 10.1016/j.neunet.2009.11.004. Epub 2009 Dec 11.