Manz Robin, Bäcker Jonas, Cramer Samantha, Meyer Philip, Müller Dominik, Muzalyova Anna, Rentschler Lukas, Wengenmayr Christoph, Hinske Ludwig Christian, Huss Ralf, Raffler Johannes, Soto-Rey Iñaki
Digital Medicine, University Hospital of Augsburg, Augsburg, Germany.
IT-Infrastructure for Translational Medical Research, University of Augsburg, Augsburg, Germany.
J Pathol Clin Res. 2025 Mar;11(2):e70023. doi: 10.1002/2056-4538.70023.
This work aimed to evaluate both the usefulness and user acceptance of five gradient-based explainable artificial intelligence (XAI) methods in the use case of a prostate carcinoma clinical decision support system environment. In addition, we aimed to determine whether XAI helps to increase the acceptance of artificial intelligence (AI) and recommend a particular method for this use case. The evaluation was conducted on a tool developed in-house with different visualization approaches to the AI-generated Gleason grade and the corresponding XAI explanations on top of the original slide. The study was a heuristic evaluation of five XAI methods. The participants were 15 pathologists from the University Hospital of Augsburg with a wide range of experience in Gleason grading and AI. The evaluation consisted of a user information form, short questionnaires on each XAI method, a ranking of the methods, and a general questionnaire to evaluate the performance and usefulness of the AI. There were significant differences between the ratings of the methods, with Grad-CAM++ performing best. Both AI decision support and XAI explanations were seen as helpful by the majority of participants. In conclusion, our pilot study suggests that the evaluated XAI methods can indeed improve the usefulness and acceptance of AI. The results obtained are a good indicator, but further studies involving larger sample sizes are warranted to draw more definitive conclusions.
这项工作旨在评估五种基于梯度的可解释人工智能(XAI)方法在前列腺癌临床决策支持系统环境用例中的实用性和用户接受度。此外,我们旨在确定XAI是否有助于提高对人工智能(AI)的接受度,并为此用例推荐一种特定方法。评估是在一个内部开发的工具上进行的,该工具对人工智能生成的 Gleason 分级采用了不同的可视化方法,并在原始幻灯片之上给出了相应的XAI解释。该研究是对五种XAI方法的启发式评估。参与者是奥格斯堡大学医院的15名病理学家,他们在Gleason分级和人工智能方面有着广泛的经验。评估包括一份用户信息表、关于每种XAI方法的简短问卷、方法排名以及一份评估人工智能性能和实用性的一般问卷。各方法的评分之间存在显著差异其中Grad-CAM++表现最佳。大多数参与者认为人工智能决策支持和XAI解释都很有帮助。总之,我们的初步研究表明,所评估的XAI方法确实可以提高人工智能的实用性和接受度。获得的结果是一个很好的指标,但需要进行更多涉及更大样本量的研究才能得出更明确的结论。