• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于分层强化学习中动作子奖励的奖励优化方法。

A reward optimization method based on action subrewards in hierarchical reinforcement learning.

作者信息

Fu Yuchen, Liu Quan, Ling Xionghong, Cui Zhiming

机构信息

Suzhou Industrial Park Institute of Services Outsourcing, Suzhou, Jiangsu 215123, China ; School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China.

School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China.

出版信息

ScientificWorldJournal. 2014 Jan 28;2014:120760. doi: 10.1155/2014/120760. eCollection 2014.

DOI:10.1155/2014/120760
PMID:24600318
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3926376/
Abstract

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are "trial and error" and "related reward." A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of "curse of dimensionality," which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The "curse of dimensionality" problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

摘要

强化学习(RL)是一种交互式学习方法。其主要特点是“试错”和“相关奖励”。为解决“维度诅咒”问题,即状态空间会随着特征数量呈指数增长且收敛速度较慢,提出了一种基于动作子奖励的分层强化学习方法。该方法可以大幅减少状态空间,并以良好的目的和效率选择动作,从而优化奖励函数并提高收敛速度。将其应用于俄罗斯方块游戏的在线学习中,实验结果表明,基于分层强化学习算法与动作子奖励相结合的新方法,该算法的收敛速度能得到显著提高。分层方法也在一定程度上解决了“维度诅咒”问题。同时还对不同参数下的所有性能进行了比较和分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80dd/3926376/ca31186a4ee0/TSWJ2014-120760.alg.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80dd/3926376/c458ce7f9a27/TSWJ2014-120760.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80dd/3926376/d4f0531f261c/TSWJ2014-120760.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80dd/3926376/9eee6372f3dd/TSWJ2014-120760.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80dd/3926376/ca31186a4ee0/TSWJ2014-120760.alg.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80dd/3926376/c458ce7f9a27/TSWJ2014-120760.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80dd/3926376/d4f0531f261c/TSWJ2014-120760.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80dd/3926376/9eee6372f3dd/TSWJ2014-120760.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/80dd/3926376/ca31186a4ee0/TSWJ2014-120760.alg.001.jpg

相似文献

1
A reward optimization method based on action subrewards in hierarchical reinforcement learning.一种基于分层强化学习中动作子奖励的奖励优化方法。
ScientificWorldJournal. 2014 Jan 28;2014:120760. doi: 10.1155/2014/120760. eCollection 2014.
2
Optimal control in microgrid using multi-agent reinforcement learning.微电网中的多智能体强化学习最优控制。
ISA Trans. 2012 Nov;51(6):743-51. doi: 10.1016/j.isatra.2012.06.010. Epub 2012 Jul 21.
3
Hierarchical Reinforcement Learning Framework in Geographic Coordination for Air Combat Tactical Pursuit.用于空战战术追击地理协调的分层强化学习框架
Entropy (Basel). 2023 Oct 1;25(10):1409. doi: 10.3390/e25101409.
4
Human locomotion with reinforcement learning using bioinspired reward reshaping strategies.基于生物启发式奖励重塑策略的强化学习的人类运动。
Med Biol Eng Comput. 2021 Jan;59(1):243-256. doi: 10.1007/s11517-020-02309-3. Epub 2021 Jan 8.
5
Kernel-based least squares policy iteration for reinforcement learning.用于强化学习的基于核的最小二乘策略迭代
IEEE Trans Neural Netw. 2007 Jul;18(4):973-92. doi: 10.1109/TNN.2007.899161.
6
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning.具有分层模型学习与规划的高效行动者-评论家算法
Comput Intell Neurosci. 2016;2016:4824072. doi: 10.1155/2016/4824072. Epub 2016 Oct 3.
7
Human-level control through deep reinforcement learning.通过深度强化学习实现人类水平的控制。
Nature. 2015 Feb 26;518(7540):529-33. doi: 10.1038/nature14236.
8
Boosting Reinforcement Learning via Hierarchical Game Playing With State Relay.通过带有状态中继的分层博弈来增强强化学习
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):7077-7089. doi: 10.1109/TNNLS.2024.3386717. Epub 2025 Apr 4.
9
Online learning of shaping rewards in reinforcement learning.强化学习中的塑造奖励在线学习。
Neural Netw. 2010 May;23(4):541-50. doi: 10.1016/j.neunet.2010.01.001. Epub 2010 Jan 11.
10
A Reinforcement Learning-Based Vehicle Platoon Control Strategy for Reducing Energy Consumption in Traffic Oscillations.一种基于强化学习的车辆编队控制策略,用于减少交通振荡中的能量消耗。
IEEE Trans Neural Netw Learn Syst. 2021 Dec;32(12):5309-5322. doi: 10.1109/TNNLS.2021.3071959. Epub 2021 Nov 30.

本文引用的文献

1
A neural signature of hierarchical reinforcement learning.分层强化学习的神经特征。
Neuron. 2011 Jul 28;71(2):370-9. doi: 10.1016/j.neuron.2011.05.042.