Merckling Astrid, Perrin-Gilbert Nicolas, Coninx Alex, Doncieux Stéphane
Sorbonne Université, CNRS, Institut des Systèmes Intelligents et de Robotique, ISIR, Paris, France.
Front Robot AI. 2022 Feb 14;9:762051. doi: 10.3389/frobt.2022.762051. eCollection 2022.
Not having access to compact and meaningful representations is known to significantly increase the complexity of reinforcement learning (RL). For this reason, it can be useful to perform state representation learning (SRL) before tackling RL tasks. However, obtaining a good state representation can only be done if a large diversity of transitions is observed, which can require a difficult exploration, especially if the environment is initially reward-free. To solve the problems of exploration and SRL in parallel, we propose a new approach called XSRL (eXploratory State Representation Learning). On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations. On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a -step learning progress bonus to form the maximization objective of a discovery policy. This results in a policy that seeks complex transitions from which the trained models can effectively learn. Our experimental results show that the approach leads to efficient exploration in challenging environments with image observations, and to state representations that significantly accelerate learning in RL tasks.
众所周知,无法获得紧凑且有意义的表示会显著增加强化学习(RL)的复杂性。因此,在处理RL任务之前执行状态表示学习(SRL)可能会很有用。然而,只有在观察到大量不同的转换时才能获得良好的状态表示,这可能需要进行困难的探索,特别是如果环境最初是无奖励的。为了并行解决探索和SRL的问题,我们提出了一种名为XSRL(探索性状态表示学习)的新方法。一方面,它联合学习紧凑的状态表示和一个状态转换估计器,该估计器用于从表示中去除不可利用的信息。另一方面,它持续训练一个逆模型,并将一个步长学习进度奖励添加到该模型的预测误差中,以形成发现策略的最大化目标。这导致了一种寻求复杂转换的策略,训练模型可以从中有效学习。我们的实验结果表明,该方法在具有图像观测的具有挑战性的环境中能实现高效探索,并能得到在RL任务中显著加速学习的状态表示。