Wang Junpeng, Gou Liang, Zhang Wei, Yang Hao, Shen Han-Wei
IEEE Trans Vis Comput Graph. 2019 Jun;25(6):2168-2180. doi: 10.1109/TVCG.2019.2903943. Epub 2019 Mar 15.
Deep Neural Networks (DNNs) have been extensively used in multiple disciplines due to their superior performance. However, in most cases, DNNs are considered as black-boxes and the interpretation of their internal working mechanism is usually challenging. Given that model trust is often built on the understanding of how a model works, the interpretation of DNNs becomes more important, especially in safety-critical applications (e.g., medical diagnosis, autonomous driving). In this paper, we propose DeepVID, a Deep learning approach to Visually Interpret and Diagnose DNN models, especially image classifiers. In detail, we train a small locally-faithful model to mimic the behavior of an original cumbersome DNN around a particular data instance of interest, and the local model is sufficiently simple such that it can be visually interpreted (e.g., a linear model). Knowledge distillation is used to transfer the knowledge from the cumbersome DNN to the small model, and a deep generative model (i.e., variational auto-encoder) is used to generate neighbors around the instance of interest. Those neighbors, which come with small feature variances and semantic meanings, can effectively probe the DNN's behaviors around the interested instance and help the small model to learn those behaviors. Through comprehensive evaluations, as well as case studies conducted together with deep learning experts, we validate the effectiveness of DeepVID.
深度神经网络(DNN)由于其卓越的性能已在多个学科中得到广泛应用。然而,在大多数情况下,DNN被视为黑箱,对其内部工作机制的解释通常具有挑战性。鉴于模型信任通常建立在对模型工作方式的理解之上,DNN的解释变得更加重要,尤其是在安全关键型应用(如医学诊断、自动驾驶)中。在本文中,我们提出了DeepVID,一种用于视觉解释和诊断DNN模型(特别是图像分类器)的深度学习方法。具体而言,我们训练一个小型的局部忠实模型,以模仿原始复杂DNN在特定感兴趣数据实例周围的行为,并且该局部模型足够简单,以至于可以进行视觉解释(例如线性模型)。知识蒸馏用于将知识从复杂的DNN转移到小型模型,并且使用深度生成模型(即变分自编码器)在感兴趣的实例周围生成邻居。那些具有小特征方差和语义含义的邻居可以有效地探究DNN在感兴趣实例周围的行为,并帮助小型模型学习这些行为。通过全面评估以及与深度学习专家一起进行的案例研究,我们验证了DeepVID的有效性。