Kim Jenia, Maathuis Henry, Sent Danielle
HU University of Applied Sciences Utrecht, Research Group Artificial Intelligence, Utrecht, Netherlands.
Jheronimus Academy of Data Science, Tilburg University, Eindhoven University of Technology, 's-Hertogenbosch, Netherlands.
Front Artif Intell. 2024 Oct 17;7:1456486. doi: 10.3389/frai.2024.1456486. eCollection 2024.
Explainable Artificial Intelligence (XAI) aims to provide insights into the inner workings and the outputs of AI systems. Recently, there's been growing recognition that explainability is inherently human-centric, tied to how people perceive explanations. Despite this, there is no consensus in the research community on whether user evaluation is crucial in XAI, and if so, what exactly needs to be evaluated and how. This systematic literature review addresses this gap by providing a detailed overview of the current state of affairs in human-centered XAI evaluation. We reviewed 73 papers across various domains where XAI was evaluated with users. These studies assessed what makes an explanation "good" from a user's perspective, i.e., what makes an explanation to a user of an AI system. We identified 30 components of meaningful explanations that were evaluated in the reviewed papers and categorized them into a taxonomy of human-centered XAI evaluation, based on: (a) the contextualized quality of the explanation, (b) the contribution of the explanation to human-AI interaction, and (c) the contribution of the explanation to human-AI performance. Our analysis also revealed a lack of standardization in the methodologies applied in XAI user studies, with only 19 of the 73 papers applying an evaluation framework used by at least one other study in the sample. These inconsistencies hinder cross-study comparisons and broader insights. Our findings contribute to understanding what makes explanations meaningful to users and how to measure this, guiding the XAI community toward a more unified approach in human-centered explainability.
可解释人工智能(XAI)旨在深入了解人工智能系统的内部运作和输出结果。最近,人们越来越认识到可解释性本质上是以人类为中心的,与人们对解释的认知方式相关。尽管如此,研究界对于用户评估在XAI中是否至关重要,以及如果至关重要,究竟需要评估什么以及如何评估,尚未达成共识。本系统文献综述通过详细概述以人类为中心的XAI评估的当前状况来弥补这一差距。我们对各个领域的73篇论文进行了综述,这些论文对XAI进行了用户评估。这些研究从用户的角度评估了什么样的解释是“好的”,即什么样的解释对人工智能系统的用户来说是好的。我们确定了在综述论文中评估的有意义解释的30个组成部分,并根据以下三个方面将它们分类为以人类为中心的XAI评估分类法:(a)解释的情境化质量,(b)解释对人机交互的贡献,以及(c)解释对人机性能的贡献。我们的分析还揭示了XAI用户研究中应用的方法缺乏标准化,在73篇论文中只有19篇应用了样本中至少另一项研究使用的评估框架。这些不一致阻碍了跨研究比较和更广泛的见解。我们的研究结果有助于理解什么样的解释对用户有意义以及如何衡量这一点,引导XAI社区在以人类为中心的可解释性方面采用更统一的方法。