Penzkofer Anna, Schaefer Simon, Strohm Florian, Bâce Mihai, Leutenegger Stefan, Bulling Andreas
Institute for Visualisation and Interactive Systems, University of Stuttgart, Pfaffenwaldring 5A, 70569 Stuttgart, Germany.
Machine Learning for Robotics, Technical University of Munich, Boltzmannstrasse 3, 85748 Munich, Germany.
Neural Comput Appl. 2025;37(23):18823-18834. doi: 10.1007/s00521-024-10596-2. Epub 2024 Dec 11.
While deep reinforcement learning (RL) agents outperform humans on an increasing number of tasks, training them requires data equivalent to decades of human gameplay. Recent hierarchical RL methods have increased sample efficiency by incorporating information inherent to the structure of the decision problem but at the cost of having to discover or use human-annotated sub-goals that guide the learning process. We show that intentions of human players, i.e. the precursor of goal-oriented decisions, can be robustly predicted from eye gaze even for the long-horizon sparse rewards task of Montezuma's Revenge-one of the most challenging RL tasks in the Atari2600 game suite. We propose : Hierarchical RL with intention-based sub-goals that are inferred from human eye gaze. Our novel sub-goal extraction pipeline is fully automatic and replaces the need for manual sub-goal annotation by human experts. Our evaluations show that replacing hand-crafted sub-goals with automatically extracted intentions leads to an HRL agent that is significantly more sample efficient than previous methods.
虽然深度强化学习(RL)智能体在越来越多的任务上超越了人类,但训练它们需要相当于数十年人类游戏玩法的数据。最近的分层RL方法通过纳入决策问题结构中固有的信息提高了样本效率,但代价是必须发现或使用指导学习过程的人工标注子目标。我们表明,即使对于蒙特祖玛的复仇(Montezuma's Revenge)这个Atari2600游戏套件中最具挑战性的RL任务之一的长视野稀疏奖励任务,人类玩家的意图(即面向目标决策的前身)也可以从眼睛注视中得到可靠预测。我们提出:基于从人类眼睛注视中推断出的意图的分层RL。我们新颖的子目标提取管道是完全自动的,取代了人类专家手动进行子目标标注的需求。我们的评估表明,用自动提取的意图取代手工制作的子目标会产生一个分层RL智能体,其样本效率比以前的方法显著更高。