IEEE Trans Neural Netw Learn Syst. 2015 Dec;26(12):3187-200. doi: 10.1109/TNNLS.2015.2418332. Epub 2015 Apr 15.
The one-class classification problem is a well-known research endeavor in pattern recognition. The problem is also known under different names, such as outlier and novelty/anomaly detection. The core of the problem consists in modeling and recognizing patterns belonging only to a so-called target class. All other patterns are termed nontarget, and therefore, they should be recognized as such. In this paper, we propose a novel one-class classification system that is based on an interplay of different techniques. Primarily, we follow a dissimilarity representation-based approach; we embed the input data into the dissimilarity space (DS) by means of an appropriate parametric dissimilarity measure. This step allows us to process virtually any type of data. The dissimilarity vectors are then represented by weighted Euclidean graphs, which we use to determine the entropy of the data distribution in the DS and at the same time to derive effective decision regions that are modeled as clusters of vertices. Since the dissimilarity measure for the input data is parametric, we optimize its parameters by means of a global optimization scheme, which considers both mesoscopic and structural characteristics of the data represented through the graphs. The proposed one-class classifier is designed to provide both hard (Boolean) and soft decisions about the recognition of test patterns, allowing an accurate description of the classification process. We evaluate the performance of the system on different benchmarking data sets, containing either feature-based or structured patterns. Experimental results demonstrate the effectiveness of the proposed technique.
一类分类问题是模式识别中一个著名的研究课题。该问题还有其他不同的名称,如异常值和新颖性/异常检测。该问题的核心在于对仅属于所谓目标类的模式进行建模和识别。所有其他模式都称为非目标模式,因此应该将其识别为非目标模式。在本文中,我们提出了一种基于不同技术相互作用的新型一类分类系统。主要地,我们遵循一种基于相似度表示的方法;我们通过适当的参数相似度度量将输入数据嵌入到相似度空间 (DS) 中。这一步允许我们处理几乎任何类型的数据。然后,相似度向量由加权欧式图表示,我们使用这些图来确定 DS 中数据分布的熵,并同时得出有效的决策区域,这些决策区域建模为顶点簇。由于输入数据的相似度度量是参数化的,我们通过全局优化方案来优化其参数,该方案同时考虑了通过图表示的数据的中观和结构特征。所提出的一类分类器旨在提供关于测试模式识别的硬 (布尔) 和软决策,从而可以准确描述分类过程。我们在包含基于特征或结构化模式的不同基准数据集上评估了系统的性能。实验结果证明了所提出技术的有效性。