Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139;
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139.
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30071-30078. doi: 10.1073/pnas.1907375117. Epub 2020 Sep 1.
Deep neural networks excel at finding hierarchical representations that solve complex tasks over large datasets. How can we humans understand these learned representations? In this work, we present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. We find evidence that the network has learned many object classes that play crucial roles in classifying scene classes. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes. By analyzing changes made when small sets of units are activated or deactivated, we find that objects can be added and removed from the output scenes while adapting to the context. Finally, we apply our analytic framework to understanding adversarial attacks and to semantic image editing.
深度神经网络擅长发现层次表示,从而在大型数据集上解决复杂任务。我们人类如何理解这些学习到的表示呢?在这项工作中,我们提出了网络剖析,这是一种分析框架,可以系统地识别图像分类和图像生成网络中各个隐藏单元的语义。首先,我们分析了在场景分类上训练的卷积神经网络 (CNN),并发现了与多种对象概念匹配的单元。我们有证据表明,该网络已经学习了许多对象类,这些对象类在分类场景类中起着至关重要的作用。其次,我们使用类似的分析方法来分析一个被训练来生成场景的生成对抗网络 (GAN) 模型。通过分析当一小部分单元被激活或失活时所做的改变,我们发现可以在适应上下文的同时向输出场景中添加和删除对象。最后,我们将我们的分析框架应用于理解对抗攻击和语义图像编辑。