• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有理论支持样本复用的广义策略改进算法

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse.

作者信息

Queeney James, Paschalidis Ioannis Ch, Cassandras Christos G

机构信息

Mitsubishi Electric Research Laboratories, Cambridge, MA 02139 USA. He performed the majority of this work while with the Division of Systems Engineering, Boston University, Boston, MA 02215 USA.

Department of Electrical and Computer Engineering and Division of Systems Engineering, Boston University, Boston, MA 02215 USA.

出版信息

IEEE Trans Automat Contr. 2025 Feb;70(2):1236-1243. doi: 10.1109/tac.2024.3454011. Epub 2024 Sep 3.

DOI:10.1109/tac.2024.3454011
PMID:40832367
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12360665/
Abstract

We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control. Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse, addressing a trade-off between two important deployment requirements for real-world control: (i) practical performance guarantees and (ii) data efficiency. We demonstrate the benefits of this new class of algorithms through extensive experimental analysis on a broad range of simulated control tasks.

摘要

我们开发了一类全新的无模型深度强化学习算法,用于数据驱动的、基于学习的控制。我们的广义策略改进算法将基于策略方法的策略改进保证与样本重用的效率相结合,解决了现实世界控制中两个重要部署要求之间的权衡:(i)实际性能保证和(ii)数据效率。我们通过对广泛的模拟控制任务进行广泛的实验分析,证明了这类新算法的优势。

相似文献

1
Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse.具有理论支持样本复用的广义策略改进算法
IEEE Trans Automat Contr. 2025 Feb;70(2):1236-1243. doi: 10.1109/tac.2024.3454011. Epub 2024 Sep 3.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Exploring Trade-Offs for Online Mental Health Matching: Agent-Based Modeling Study.探索在线心理健康匹配的权衡:基于代理的建模研究。
JMIR Form Res. 2024 Oct 1;8:e58241. doi: 10.2196/58241.
4
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
5
Public preferences for health and non-health outcomes of Universal Basic Income and alternative income-based policies: A mixed-method feasibility study.公众对普遍基本收入和其他基于收入的政策的健康与非健康结果的偏好:一项混合方法可行性研究。
Public Health Res (Southampt). 2025 Jul 30:1-26. doi: 10.3310/ALDS8846.
6
Shapley value-driven multi-modal deep reinforcement learning for complex decision-making.用于复杂决策的沙普利值驱动多模态深度强化学习
Neural Netw. 2025 Nov;191:107650. doi: 10.1016/j.neunet.2025.107650. Epub 2025 Jun 21.
7
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
8
Therapist-supported Internet cognitive behavioural therapy for anxiety disorders in adults.治疗师支持的针对成人焦虑症的互联网认知行为疗法。
Cochrane Database Syst Rev. 2016 Mar 12;3(3):CD011565. doi: 10.1002/14651858.CD011565.pub2.
9
Precise and dexterous robotic manipulation via human-in-the-loop reinforcement learning.通过人在回路强化学习实现精确且灵活的机器人操作。
Sci Robot. 2025 Aug 20;10(105):eads5033. doi: 10.1126/scirobotics.ads5033.
10
Actor critic with experience replay-based automatic treatment planning for prostate cancer intensity modulated radiotherapy.基于经验回放的演员-评论家算法用于前列腺癌调强放射治疗的自动治疗计划
Med Phys. 2025 Jul;52(7):e17915. doi: 10.1002/mp.17915. Epub 2025 May 31.

本文引用的文献

1
Authentic Boundary Proximal Policy Optimization.真实边界近端策略优化。
IEEE Trans Cybern. 2022 Sep;52(9):9428-9438. doi: 10.1109/TCYB.2021.3051456. Epub 2022 Aug 18.
2
An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning.一种具有单调改进保证的深度强化学习离策略信赖域策略优化方法
IEEE Trans Neural Netw Learn Syst. 2022 May;33(5):2223-2235. doi: 10.1109/TNNLS.2020.3044196. Epub 2022 May 2.