• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

强化学习中的集成算法。

Ensemble algorithms in reinforcement learning.

作者信息

Wiering Marco A, van Hasselt Hado

机构信息

Department of Artificial Intelligence, University of Groningen, 9400 AK Groningen, The Netherlands.

出版信息

IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):930-6. doi: 10.1109/TSMCB.2008.920231.

DOI:10.1109/TSMCB.2008.920231
PMID:18632380
Abstract

This paper describes several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent. The aim is to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms. We designed and implemented four different ensemble methods combining the following five different RL algorithms: Q-learning, Sarsa, actor-critic (AC), QV-learning, and AC learning automaton. The intuitively designed ensemble methods, namely, majority voting (MV), rank voting, Boltzmann multiplication (BM), and Boltzmann addition, combine the policies derived from the value functions of the different RL algorithms, in contrast to previous work where ensemble methods have been used in RL for representing and learning a single value function. We show experiments on five maze problems of varying complexity; the first problem is simple, but the other four maze tasks are of a dynamic or partially observable nature. The results indicate that the BM and MV ensembles significantly outperform the single RL algorithms.

摘要

本文描述了几种在单个智能体中结合多种不同强化学习(RL)算法的集成方法。目的是通过结合不同RL算法的选定动作或动作概率来提高学习速度和最终性能。我们设计并实现了四种不同的集成方法,它们结合了以下五种不同的RL算法:Q学习、Sarsa、演员-评论家(AC)、QV学习和AC学习自动机。直观设计的集成方法,即多数投票(MV)、排名投票、玻尔兹曼乘法(BM)和玻尔兹曼加法,结合了从不同RL算法的值函数导出的策略,这与之前在RL中使用集成方法来表示和学习单个值函数的工作形成对比。我们展示了在五个不同复杂度的迷宫问题上的实验;第一个问题很简单,但其他四个迷宫任务具有动态或部分可观察的性质。结果表明,BM和MV集成方法明显优于单个RL算法。

相似文献

1
Ensemble algorithms in reinforcement learning.强化学习中的集成算法。
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):930-6. doi: 10.1109/TSMCB.2008.920231.
2
Reinforcement learning in continuous time and space: interference and not ill conditioning is the main problem when using distributed function approximators.连续时间和空间中的强化学习:使用分布式函数逼近器时,主要问题是干扰而非病态。
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):950-6. doi: 10.1109/TSMCB.2008.921000.
3
Improved Adaptive-Reinforcement Learning Control for morphing unmanned air vehicles.用于变形无人机的改进自适应强化学习控制
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):1014-20. doi: 10.1109/TSMCB.2008.922018.
4
An evolutionary approach toward dynamic self-generated fuzzy inference systems.一种针对动态自生成模糊推理系统的进化方法。
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):963-9. doi: 10.1109/TSMCB.2008.922053.
5
Control of nonaffine nonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks.基于强化学习的线性参数化神经网络对非仿射非线性离散时间系统的控制
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):994-1001. doi: 10.1109/TSMCB.2008.926607.
6
Incoherent control of quantum systems with wavefunction-controllable subspaces via quantum reinforcement learning.通过量子强化学习实现具有波函数可控子空间的量子系统的非相干控制。
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):957-62. doi: 10.1109/TSMCB.2008.926603.
7
A spiking neural network model of an actor-critic learning agent.一种基于演员-评论家学习智能体的脉冲神经网络模型。
Neural Comput. 2009 Feb;21(2):301-39. doi: 10.1162/neco.2008.08-07-593.
8
Adaptive feedback control by constrained approximate dynamic programming.基于约束近似动态规划的自适应反馈控制。
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):982-7. doi: 10.1109/TSMCB.2008.924140.
9
Issues on stability of ADP feedback controllers for dynamical systems.动态系统中ADP反馈控制器的稳定性问题。
IEEE Trans Syst Man Cybern B Cybern. 2008 Aug;38(4):913-7. doi: 10.1109/TSMCB.2008.926599.
10
A parameter control method in reinforcement learning to rapidly follow unexpected environmental changes.一种强化学习中用于快速跟踪意外环境变化的参数控制方法。
Biosystems. 2004 Nov;77(1-3):109-17. doi: 10.1016/j.biosystems.2004.05.001.

引用本文的文献

1
Constructing ancestral recombination graphs through reinforcement learning.通过强化学习构建祖先重组图。
Front Genet. 2025 Apr 29;16:1569358. doi: 10.3389/fgene.2025.1569358. eCollection 2025.
2
Machine learning empowered coherent Raman imaging and analysis for biomedical applications.机器学习助力生物医学应用中的相干拉曼成像与分析。
Commun Eng. 2025 Jan 25;4(1):8. doi: 10.1038/s44172-025-00345-1.
3
Mixing memory and desire: How memory reactivation supports deliberative decision-making.混合记忆与欲望:记忆再激活如何支持审慎决策。
Wiley Interdiscip Rev Cogn Sci. 2022 Mar;13(2):e1581. doi: 10.1002/wcs.1581. Epub 2021 Oct 19.
4
Reinforcement Learning With Human Advice: A Survey.基于人类建议的强化学习:一项综述。
Front Robot AI. 2021 Jun 1;8:584075. doi: 10.3389/frobt.2021.584075. eCollection 2021.
5
Modeling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning.通过适应性工作记忆和强化学习的协调,对任意视觉运动学习过程中的选择和反应时间进行建模。
Front Behav Neurosci. 2015 Aug 26;9:225. doi: 10.3389/fnbeh.2015.00225. eCollection 2015.