Green Michael, Ekelund Ulf, Edenbrandt Lars, Björk Jonas, Forberg Jakob Lundager, Ohlsson Mattias
Computational Biology and Biological Physics Group, Department of Theoretical Physics, Lund University, and Department of Emergency Medicine, Lund University Hospital, Sölvegatan 14A, SE-223 62 Lund, Sweden.
Neural Netw. 2009 Jan;22(1):75-81. doi: 10.1016/j.neunet.2008.09.014. Epub 2008 Oct 17.
Artificial neural network (ANN) ensembles have long suffered from a lack of interpretability. This has severely limited the practical usability of ANNs in settings where an erroneous decision can be disastrous. Several attempts have been made to alleviate this problem. Many of them are based on decomposing the decision boundary of the ANN into a set of rules. We explore and compare a set of new methods for this explanation process on two artificial data sets (Monks 1 and 3), and one acute coronary syndrome data set consisting of 861 electrocardiograms (ECG) collected retrospectively at the emergency department at Lund University Hospital. The algorithms managed to extract good explanations in more than 84% of the cases. More to the point, the best method provided 99% and 91% good explanations in Monks data 1 and 3 respectively. Also there was a significant overlap between the algorithms. Furthermore, when explaining a given ECG, the overlap between this method and one of the physicians was the same as the one between the two physicians in this study. Still the physicians were significantly, p-value<0.001, more similar to each other than to any of the methods. The algorithms have the potential to be used as an explanatory aid when using ANN ensembles in clinical decision support systems.
长期以来,人工神经网络(ANN)集成一直缺乏可解释性。这严重限制了人工神经网络在错误决策可能带来灾难性后果的场景中的实际可用性。人们已经进行了多次尝试来缓解这个问题。其中许多尝试基于将人工神经网络的决策边界分解为一组规则。我们在两个人工数据集(蒙克斯1和蒙克斯3)以及一个由隆德大学医院急诊科回顾性收集的861份心电图(ECG)组成的急性冠状动脉综合征数据集上,探索并比较了一组用于此解释过程的新方法。这些算法在超过84%的案例中成功提取了有效的解释。更重要的是,最佳方法在蒙克斯数据集1和蒙克斯数据集3中分别提供了99%和91%的有效解释。而且算法之间存在显著的重叠。此外,在解释给定的心电图时,该方法与其中一位医生的解释之间的重叠程度与本研究中两位医生之间的重叠程度相同。尽管如此,医生之间的相似性显著高于算法与医生之间的相似性,p值<0.001。在临床决策支持系统中使用人工神经网络集成时,这些算法有潜力用作解释辅助工具。