Jiang Wei, Wen Xiangyu, Zhan Jinyu, Wang Xupeng, Song Ziwei, Bian Chen
IEEE Trans Neural Netw Learn Syst. 2024 Mar;35(3):4032-4046. doi: 10.1109/TNNLS.2022.3201586. Epub 2024 Feb 29.
Backdoor attack to deep neural networks (DNNs) is among the predominant approaches to bring great threats into artificial intelligence. The existing methods to detect backdoor attacks focus on the perspective of distributions in DNNs, however, limited by its ability of generalization across DNN models. In this article, a critical-path-based backdoor detector (CPBD) is proposed, which approaches to detect backdoor attacks via DNN's interpretability. CPBD is designed to efficiently discover the characteristics of backdoors, which distinguish the critical paths in the attacked DNNs. To deal with the intractably large number of neurons, we propose to simplify the neurons, and the preserved key nodes are integrated into a set of critical paths. Thus, a DNN model can be formulated as a combination of several critical paths. Afterward, the detection of backdoors is performed based on the analysis of critical paths corresponding to different classes. Then, combining all the above steps, the CPBD algorithm is integrated to present the results in a standard and systematic manner. In addition, CPBD is able to locate neurons associated with malicious triggers, the combination of which is named as trigger propagation path. Extensive experiments are conducted, which testify the efficiency of the proposed method on multiple DNNs and different trigger sizes.
对深度神经网络(DNN)的后门攻击是给人工智能带来巨大威胁的主要途径之一。现有的检测后门攻击的方法主要从DNN中的分布角度出发,然而,其跨DNN模型的泛化能力有限。在本文中,提出了一种基于关键路径的后门检测器(CPBD),该检测器通过DNN的可解释性来检测后门攻击。CPBD旨在有效发现后门的特征,这些特征区分了受攻击DNN中的关键路径。为了处理数量庞大的神经元,我们提出简化神经元,并将保留的关键节点整合到一组关键路径中。因此,一个DNN模型可以被表述为几个关键路径的组合。随后,基于对不同类别的关键路径的分析来进行后门检测。然后,结合上述所有步骤,集成CPBD算法以标准和系统的方式呈现结果。此外,CPBD能够定位与恶意触发器相关的神经元,这些神经元的组合被称为触发器传播路径。进行了大量实验,验证了所提方法在多个DNN和不同触发器大小上的有效性。