Erasmus Adrian, Brunet Tyler D P, Fisher Eyal
Institute for the Future of Knowledge, University of Johannesburg, Johannesburg, South Africa.
Department of History and Philosophy of Science, University of Cambridge, Free School Ln., Cambridge, CB2 3RH UK.
Philos Technol. 2021;34(4):833-862. doi: 10.1007/s13347-020-00435-2. Epub 2020 Nov 12.
We argue that artificial networks are explainable and offer a novel theory of interpretability. Two sets of conceptual questions are prominent in theoretical engagements with artificial neural networks, especially in the context of medical artificial intelligence: (1) Are networks , and if so, what does it mean to explain the output of a network? And (2) what does it mean for a network to be ? We argue that accounts of "explanation" tailored specifically to neural networks have ineffectively reinvented the wheel. In response to (1), we show how four familiar accounts of explanation apply to neural networks as they would to any scientific phenomenon. We diagnose the confusion about explaining neural networks within the machine learning literature as an equivocation on "explainability," "understandability" and "interpretability." To remedy this, we distinguish between these notions, and answer (2) by offering a theory and typology of interpretation in machine learning. Interpretation is something one does to an explanation with the aim of producing another, more understandable, explanation. As with explanation, there are various concepts and methods involved in interpretation: or , or , and or . Our account of "interpretability" is consistent with uses in the machine learning literature, in keeping with the philosophy of explanation and understanding, and pays special attention to medical artificial intelligence systems.
我们认为人工网络是可解释的,并提供了一种新颖的可解释性理论。在与人工神经网络的理论探讨中,尤其是在医学人工智能的背景下,有两组概念性问题尤为突出:(1)网络是否可解释,如果是,解释网络输出意味着什么?以及(2)网络具有可解释性意味着什么?我们认为专门为神经网络量身定制的“解释”说法实际上是在做无用功。针对问题(1),我们展示了四种常见的解释方式如何像应用于任何科学现象一样应用于神经网络。我们将机器学习文献中关于解释神经网络的困惑诊断为对“可解释性”“可理解性”和“可诠释性”的混淆。为了纠正这一点,我们区分了这些概念,并通过提供机器学习中的诠释理论和类型学来回答问题(2)。诠释是对一种解释进行的操作,目的是产生另一种更易于理解的解释。与解释一样,诠释涉及各种概念和方法:或,或,以及或。我们对“可诠释性”的阐述与机器学习文献中的用法一致,符合解释与理解的哲学,并特别关注医学人工智能系统。