聚类方法检测多种对抗样本。

Clustering Approach for Detecting Multiple Types of Adversarial Examples.

机构信息

School of Computer Science and Engineering, Pusan National University, Busan 46241, Korea.

Korea Apparel Testing & Research Institute, Seoul 02579, Korea.

出版信息

Sensors (Basel). 2022 May 18;22(10):3826. doi: 10.3390/s22103826.

DOI:10.3390/s22103826

PMID:35632235

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9146128/

Abstract

With intentional feature perturbations to a deep learning model, the adversary generates an adversarial example to deceive the deep learning model. As an adversarial example has recently been considered in the most severe problem of deep learning technology, its defense methods have been actively studied. Such effective defense methods against adversarial examples are categorized into one of the three architectures: (1) model retraining architecture; (2) input transformation architecture; and (3) adversarial example detection architecture. Especially, defense methods using adversarial example detection architecture have been actively studied. This is because defense methods using adversarial example detection architecture do not make wrong decisions for the legitimate input data while others do. In this paper, we note that current defense methods using adversarial example detection architecture can classify the input data into only either a legitimate one or an adversarial one. That is, the current defense methods using adversarial example detection architecture can only detect the adversarial examples and cannot classify the input data into multiple classes of data, i.e., legitimate input data and various types of adversarial examples. To classify the input data into multiple classes of data while increasing the accuracy of the clustering model, we propose an advanced defense method using adversarial example detection architecture, which extracts the key features from the input data and feeds the extracted features into a clustering model. From the experimental results under various application datasets, we show that the proposed method can detect the adversarial examples while classifying the types of adversarial examples. We also show that the accuracy of the proposed method outperforms the accuracy of recent defense methods using adversarial example detection architecture.

摘要

通过对深度学习模型进行有意的特征扰动，攻击者生成对抗样本来欺骗深度学习模型。由于对抗样例最近被认为是深度学习技术中最严重的问题之一，因此其防御方法已被积极研究。这种有效的对抗样例防御方法分为三种架构之一：（1）模型再训练架构；（2）输入转换架构；和（3）对抗样例检测架构。特别是，使用对抗样例检测架构的防御方法已经得到了积极的研究。这是因为使用对抗样例检测架构的防御方法不会对合法输入数据做出错误决策，而其他方法则会。在本文中，我们注意到，当前使用对抗样例检测架构的防御方法只能将输入数据分类为合法数据或对抗样例。也就是说，当前使用对抗样例检测架构的防御方法只能检测对抗样例，而不能将输入数据分类为多个类别的数据，即合法输入数据和各种类型的对抗样例。为了在提高聚类模型准确性的同时将输入数据分类为多个类别的数据，我们提出了一种使用对抗样例检测架构的高级防御方法，该方法从输入数据中提取关键特征，并将提取的特征输入到聚类模型中。从各种应用数据集的实验结果来看，我们表明该方法可以检测对抗样例，同时还可以对对抗样例的类型进行分类。我们还表明，该方法的准确性优于最近使用对抗样例检测架构的防御方法的准确性。