Huber Tobias, Limmer Benedikt, André Elisabeth
Chair for Human-Centered Artificial Intelligence, University of Augsburg, Augsburg, Germany.
Front Artif Intell. 2022 Jul 13;5:903875. doi: 10.3389/frai.2022.903875. eCollection 2022.
One of the most prominent methods for explaining the behavior of Deep Reinforcement Learning (DRL) agents is the generation of saliency maps that show how much each pixel attributed to the agents' decision. However, there is no work that computationally evaluates and compares the fidelity of different perturbation-based saliency map approaches specifically for DRL agents. It is particularly challenging to computationally evaluate saliency maps for DRL agents since their decisions are part of an overarching policy, which includes long-term decision making. For instance, the output neurons of value-based DRL algorithms encode both the value of the current state as well as the expected future reward after doing each action in this state. This ambiguity should be considered when evaluating saliency maps for such agents. In this paper, we compare five popular perturbation-based approaches to create saliency maps for DRL agents trained on four different Atari 2,600 games. The approaches are compared using two computational metrics: dependence on the learned parameters of the underlying deep Q-network of the agents (sanity checks) and fidelity to the agents' reasoning (input degradation). During the sanity checks, we found that a popular noise-based saliency map approach for DRL agents shows little dependence on the parameters of the output layer. We demonstrate that this can be fixed by tweaking the algorithm such that it focuses on specific actions instead of the general entropy within the output values. For fidelity, we identify two main factors that influence which saliency map approach should be chosen in which situation. Particular to value-based DRL agents, we show that analyzing the agents' choice of action requires different saliency map approaches than analyzing the agents' state value estimation.
解释深度强化学习(DRL)智能体行为的最突出方法之一是生成显著性图,该图展示了每个像素对智能体决策的贡献程度。然而,目前还没有工作专门针对DRL智能体,通过计算来评估和比较不同基于扰动的显著性图方法的保真度。对DRL智能体的显著性图进行计算评估尤其具有挑战性,因为它们的决策是总体策略的一部分,其中包括长期决策。例如,基于价值的DRL算法的输出神经元既编码当前状态的值,也编码在该状态下执行每个动作后的预期未来奖励。在评估此类智能体的显著性图时,应考虑这种模糊性。在本文中,我们比较了五种流行的基于扰动的方法,为在四个不同的雅达利2600游戏上训练的DRL智能体创建显著性图。使用两个计算指标对这些方法进行比较:对智能体底层深度Q网络学习参数的依赖性(合理性检查)和对智能体推理的保真度(输入退化)。在合理性检查期间,我们发现一种流行的基于噪声的DRL智能体显著性图方法对输出层参数的依赖性很小。我们证明,可以通过调整算法来解决这个问题,使其专注于特定动作而不是输出值内的一般熵。对于保真度,我们确定了两个主要因素,它们影响在何种情况下应选择哪种显著性图方法。特别是对于基于价值的DRL智能体,我们表明,分析智能体的动作选择需要与分析智能体的状态价值估计不同的显著性图方法。