IEEE Trans Neural Netw Learn Syst. 2017 Dec;28(12):2846-2858. doi: 10.1109/TNNLS.2016.2608983. Epub 2016 Sep 28.
One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach also takes into account the possibility to process nonnumeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data, and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the -Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.
单类分类器为评估数据中的异常值提供了有价值的工具。在本文中,我们提出了一种基于熵扩展图的单类分类器设计方法。我们的方法还考虑了通过嵌入过程处理非数字数据的可能性。扩展图是在嵌入的输入数据上学习的,而顶点的输出分区定义了分类器。最终的分区是通过利用基于互信息最小化的准则得出的。在这里,我们通过使用方便的基于 -Jensen 差的公式来计算互信息。训练完成后,为了将置信水平与分类器决策相关联,构建了基于图的模糊模型。模糊化过程仅基于熵扩展图的顶点的拓扑信息。因此,所提出的单类分类器也适用于具有复杂几何结构的数据。我们在包含特征向量和标记图的知名基准上进行了实验。此外,我们通过考虑输入样本的几种表示形式,将该方法应用于蛋白质溶解度识别问题。实验结果表明,与其他最先进的方法相比,该方法具有有效性和通用性。