IEEE Trans Neural Netw Learn Syst. 2017 Jul;28(7):1722-1729. doi: 10.1109/TNNLS.2016.2547220. Epub 2016 Apr 12.
Recognizing the samples belonging to one class in a heterogeneous data set is a very interesting but tough machine learning task. Some samples of the data set can be actual outliers or members of other classes for which training examples are lacking. In contrast to other kernel approaches present in the literature, in this work, the problem is faced defining a one-class kernel machine that delivers the probability for a sample to belong to the support of the distribution and that can be efficiently trained by a hybrid sequential minimal optimization-expectation maximization algorithm. Due to the analogy to the import vector machine and to the one-class approach, we named the method import vector domain description (IVDD). IVDD was tested on a toy 2-D data set in order to characterize its behavior on a set of widely used benchmarking UCI data sets and, lastly, challenged against a real world outlier detection data set. All the results were compared against state-of-the-art closely related methods such as one-class-SVM and Support Vector Domain Description, proving that the algorithm is equally accurate with the additional advantage of delivering the probability estimate for each sample. Finally, a few variants aimed at providing memory savings and/or computational speed-up in the light of big data analysis are briefly sketched.
在异质数据集识别属于某一类别的样本是一项非常有趣但具有挑战性的机器学习任务。该数据集的一些样本可能是实际的离群点,或者是缺乏训练样本的其他类别的成员。与文献中存在的其他核方法不同,在这项工作中,我们需要定义一个单类核机器,该机器可以为样本属于分布支撑的概率,并可以通过混合序列最小化优化-期望最大化算法有效地训练。由于与导入向量机和单类方法的类比,我们将该方法命名为导入向量域描述(IVDD)。在一个二维玩具数据集上对 IVDD 进行了测试,以便在一组广泛使用的 UCI 基准数据集上对其行为进行特征描述,最后,与真实的异常检测数据集进行了对比。所有结果都与最先进的相关方法(如单类-SVM 和支持向量域描述)进行了比较,证明了该算法具有相同的准确性,并且具有为每个样本提供概率估计的额外优势。最后,简要概述了几个旨在根据大数据分析提供内存节省和/或计算加速的变体。