• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于意图的分层强化学习:Int-HRL

Int-HRL: towards intention-based hierarchical reinforcement learning.

作者信息

Penzkofer Anna, Schaefer Simon, Strohm Florian, Bâce Mihai, Leutenegger Stefan, Bulling Andreas

机构信息

Institute for Visualisation and Interactive Systems, University of Stuttgart, Pfaffenwaldring 5A, 70569 Stuttgart, Germany.

Machine Learning for Robotics, Technical University of Munich, Boltzmannstrasse 3, 85748 Munich, Germany.

出版信息

Neural Comput Appl. 2025;37(23):18823-18834. doi: 10.1007/s00521-024-10596-2. Epub 2024 Dec 11.

DOI:10.1007/s00521-024-10596-2
PMID:40756565
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12313806/
Abstract

While deep reinforcement learning (RL) agents outperform humans on an increasing number of tasks, training them requires data equivalent to decades of human gameplay. Recent hierarchical RL methods have increased sample efficiency by incorporating information inherent to the structure of the decision problem but at the cost of having to discover or use human-annotated sub-goals that guide the learning process. We show that intentions of human players, i.e. the precursor of goal-oriented decisions, can be robustly predicted from eye gaze even for the long-horizon sparse rewards task of Montezuma's Revenge-one of the most challenging RL tasks in the Atari2600 game suite. We propose : Hierarchical RL with intention-based sub-goals that are inferred from human eye gaze. Our novel sub-goal extraction pipeline is fully automatic and replaces the need for manual sub-goal annotation by human experts. Our evaluations show that replacing hand-crafted sub-goals with automatically extracted intentions leads to an HRL agent that is significantly more sample efficient than previous methods.

摘要

虽然深度强化学习(RL)智能体在越来越多的任务上超越了人类,但训练它们需要相当于数十年人类游戏玩法的数据。最近的分层RL方法通过纳入决策问题结构中固有的信息提高了样本效率,但代价是必须发现或使用指导学习过程的人工标注子目标。我们表明,即使对于蒙特祖玛的复仇(Montezuma's Revenge)这个Atari2600游戏套件中最具挑战性的RL任务之一的长视野稀疏奖励任务,人类玩家的意图(即面向目标决策的前身)也可以从眼睛注视中得到可靠预测。我们提出:基于从人类眼睛注视中推断出的意图的分层RL。我们新颖的子目标提取管道是完全自动的,取代了人类专家手动进行子目标标注的需求。我们的评估表明,用自动提取的意图取代手工制作的子目标会产生一个分层RL智能体,其样本效率比以前的方法显著更高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9bd/12313806/e1058b8f10ae/521_2024_10596_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9bd/12313806/e257852212bc/521_2024_10596_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9bd/12313806/2072c19e8e8f/521_2024_10596_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9bd/12313806/32793706bf63/521_2024_10596_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9bd/12313806/e1058b8f10ae/521_2024_10596_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9bd/12313806/e257852212bc/521_2024_10596_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9bd/12313806/2072c19e8e8f/521_2024_10596_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9bd/12313806/32793706bf63/521_2024_10596_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a9bd/12313806/e1058b8f10ae/521_2024_10596_Fig4_HTML.jpg

相似文献

1
Int-HRL: towards intention-based hierarchical reinforcement learning.基于意图的分层强化学习:Int-HRL
Neural Comput Appl. 2025;37(23):18823-18834. doi: 10.1007/s00521-024-10596-2. Epub 2024 Dec 11.
2
The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》
Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
5
Short-Term Memory Impairment短期记忆障碍
6
Behavioral interventions to reduce risk for sexual transmission of HIV among men who have sex with men.降低男男性行为者中艾滋病毒性传播风险的行为干预措施。
Cochrane Database Syst Rev. 2008 Jul 16(3):CD001230. doi: 10.1002/14651858.CD001230.pub2.
7
EORTC guidelines for the use of erythropoietic proteins in anaemic patients with cancer: 2006 update.欧洲癌症研究与治疗组织(EORTC)癌症贫血患者促红细胞生成蛋白使用指南:2006年更新版
Eur J Cancer. 2007 Jan;43(2):258-70. doi: 10.1016/j.ejca.2006.10.014. Epub 2006 Dec 19.
8
Digital interventions in mental health: evidence syntheses and economic modelling.数字干预在精神健康中的应用:证据综合和经济建模。
Health Technol Assess. 2022 Jan;26(1):1-182. doi: 10.3310/RCTI6942.
9
Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备:证据综合和成本效益分析。
Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.
10
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.

本文引用的文献

1
Atari-HEAD: Atari Human Eye-Tracking and Demonstration Dataset.雅达利头部:雅达利人眼追踪与演示数据集。
Proc AAAI Conf Artif Intell. 2020 Feb;34(4):6811-6820. doi: 10.1609/aaai.v34i04.6161. Epub 2020 Apr 3.
2
Contextual encoder-decoder network for visual saliency prediction.上下文编解码网络的视觉显著性预测。
Neural Netw. 2020 Sep;129:261-270. doi: 10.1016/j.neunet.2020.05.004. Epub 2020 May 8.
3
Grandmaster level in StarCraft II using multi-agent reinforcement learning.星际争霸 II 中的大师级水平使用多智能体强化学习。
Nature. 2019 Nov;575(7782):350-354. doi: 10.1038/s41586-019-1724-z. Epub 2019 Oct 30.
4
Data Visualization Saliency Model: A Tool for Evaluating Abstract Data Visualizations.数据可视化显著度模型:评估抽象数据可视化的工具。
IEEE Trans Vis Comput Graph. 2018 Jan;24(1):563-573. doi: 10.1109/TVCG.2017.2743939. Epub 2017 Aug 29.
5
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.更快的 R-CNN:基于区域建议网络的实时目标检测。
IEEE Trans Pattern Anal Mach Intell. 2017 Jun;39(6):1137-1149. doi: 10.1109/TPAMI.2016.2577031. Epub 2016 Jun 6.
6
Using gaze patterns to predict task intent in collaboration.利用注视模式预测协作中的任务意图。
Front Psychol. 2015 Jul 24;6:1049. doi: 10.3389/fpsyg.2015.01049. eCollection 2015.
7
Model-based hierarchical reinforcement learning and human action control.基于模型的分层强化学习与人类行为控制。
Philos Trans R Soc Lond B Biol Sci. 2014 Nov 5;369(1655). doi: 10.1098/rstb.2013.0480.
8
Flicker and the efficiency of cues for capturing attention.闪烁与吸引注意力线索的效率
Vision Res. 1999 Sep;39(19):3241-52. doi: 10.1016/s0042-6989(99)00014-0.