• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于聚类的图拉普拉斯强化学习中值函数逼近框架。

A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.

出版信息

IEEE Trans Cybern. 2014 Dec;44(12):2613-25. doi: 10.1109/TCYB.2014.2311578. Epub 2014 Apr 25.

DOI:10.1109/TCYB.2014.2311578
PMID:24802018
Abstract

In order to deal with the sequential decision problems with large or continuous state spaces, feature representation and function approximation have been a major research topic in reinforcement learning (RL). In this paper, a clustering-based graph Laplacian framework is presented for feature representation and value function approximation (VFA) in RL. By making use of clustering-based techniques, that is, K-means clustering or fuzzy C-means clustering, a graph Laplacian is constructed by subsampling in Markov decision processes (MDPs) with continuous state spaces. The basis functions for VFA can be automatically generated from spectral analysis of the graph Laplacian. The clustering-based graph Laplacian is integrated with a class of approximation policy iteration algorithms called representation policy iteration (RPI) for RL in MDPs with continuous state spaces. Simulation and experimental results show that, compared with previous RPI methods, the proposed approach needs fewer sample points to compute an efficient set of basis functions and the learning control performance can be improved for a variety of parameter settings.

摘要

为了解决具有大或连续状态空间的序贯决策问题,特征表示和函数逼近一直是强化学习(RL)的主要研究课题。本文提出了一种基于聚类的图拉普拉斯框架,用于 RL 中的特征表示和价值函数逼近(VFA)。通过利用基于聚类的技术,即 K-均值聚类或模糊 C-均值聚类,在具有连续状态空间的马尔可夫决策过程(MDP)中通过子采样构建图拉普拉斯。VFA 的基函数可以从图拉普拉斯的谱分析中自动生成。基于聚类的图拉普拉斯与一类称为表示策略迭代(RPI)的近似策略迭代算法相结合,用于具有连续状态空间的 MDP 中的 RL。仿真和实验结果表明,与以前的 RPI 方法相比,所提出的方法需要更少的样本点来计算一组有效的基函数,并且可以针对各种参数设置来提高学习控制性能。

相似文献

1
A clustering-based graph Laplacian framework for value function approximation in reinforcement learning.基于聚类的图拉普拉斯强化学习中值函数逼近框架。
IEEE Trans Cybern. 2014 Dec;44(12):2613-25. doi: 10.1109/TCYB.2014.2311578. Epub 2014 Apr 25.
2
Hierarchical approximate policy iteration with binary-tree state space decomposition.基于二叉树状态空间分解的分层近似策略迭代
IEEE Trans Neural Netw. 2011 Dec;22(12):1863-77. doi: 10.1109/TNN.2011.2168422. Epub 2011 Oct 10.
3
Kernel-based least squares policy iteration for reinforcement learning.用于强化学习的基于核的最小二乘策略迭代
IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.
4
Learning locality preserving graph from data.从数据中学习保局图。
IEEE Trans Cybern. 2014 Nov;44(11):2088-98. doi: 10.1109/TCYB.2014.2300489. Epub 2014 Jul 8.
5
Toward the optimization of normalized graph Laplacian.迈向归一化图拉普拉斯算子的优化。
IEEE Trans Neural Netw. 2011 Apr;22(4):660-6. doi: 10.1109/TNN.2011.2107919. Epub 2011 Feb 28.
6
Unsupervised active learning based on hierarchical graph-theoretic clustering.基于层次图论聚类的无监督主动学习
IEEE Trans Syst Man Cybern B Cybern. 2009 Oct;39(5):1147-61. doi: 10.1109/TSMCB.2009.2013197. Epub 2009 Mar 24.
7
Manifold-Based Reinforcement Learning via Locally Linear Reconstruction.基于流形的局部线性重构强化学习。
IEEE Trans Neural Netw Learn Syst. 2017 Apr;28(4):934-947. doi: 10.1109/TNNLS.2015.2505084. Epub 2016 Jan 27.
8
Initialization independent clustering with actively self-training method.采用主动自训练方法的初始化无关聚类
IEEE Trans Syst Man Cybern B Cybern. 2012 Feb;42(1):17-27. doi: 10.1109/TSMCB.2011.2161607. Epub 2011 Nov 11.
9
On the relation of slow feature analysis and Laplacian eigenmaps.关于慢特征分析和拉普拉斯特征映射的关系。
Neural Comput. 2011 Dec;23(12):3287-302. doi: 10.1162/NECO_a_00214. Epub 2011 Sep 15.
10
Pattern vectors from algebraic graph theory.来自代数图论的模式向量。
IEEE Trans Pattern Anal Mach Intell. 2005 Jul;27(7):1112-24. doi: 10.1109/TPAMI.2005.145.

引用本文的文献

1
Model Learning and Knowledge Sharing for Cooperative Multiagent Systems in Stochastic Environment.随机环境下合作多智能体系统的模型学习与知识共享
IEEE Trans Cybern. 2021 Dec;51(12):5717-5727. doi: 10.1109/TCYB.2019.2958912. Epub 2021 Dec 22.
2
A biomarker basing on radiomics for the prediction of overall survival in non-small cell lung cancer patients.基于影像组学的生物标志物预测非小细胞肺癌患者的总生存期。
Respir Res. 2018 Oct 10;19(1):199. doi: 10.1186/s12931-018-0887-8.
3
Cluster Prototypes and Fuzzy Memberships Jointly Leveraged Cross-Domain Maximum Entropy Clustering.
基于聚类原型和模糊隶属度的跨领域最大熵聚类。
IEEE Trans Cybern. 2016 Jan;46(1):181-93. doi: 10.1109/TCYB.2015.2399351.