Suppr超能文献

关于异常值检测和单类分类的评估:算法、模型选择和集成的比较研究。

On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles.

作者信息

Marques Henrique O, Swersky Lorne, Sander Jörg, Campello Ricardo J G B, Zimek Arthur

机构信息

University of Southern Denmark, Odense, Denmark.

University of Alberta, Edmonton, Canada.

出版信息

Data Min Knowl Discov. 2023;37(4):1473-1517. doi: 10.1007/s10618-023-00931-x. Epub 2023 May 16.

Abstract

UNLABELLED

It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56-64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147-153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s10618-023-00931-x.

摘要

未标注

已经表明,无监督异常值检测方法可以适用于单类分类问题(扬森斯和波斯特马,见:第18届比利时 - 荷兰机器学习年度会议论文集,第56 - 64页,2009年;扬森斯等人,见:2009年ICMLA国际机器学习与应用会议论文集,IEEE计算机学会,第147 - 153页,2009年。10.1109/ICMLA.2009.16)。在本文中,我们专注于将单类分类算法与这种经过适配的无监督异常值检测方法进行比较,在几个重要方面改进了先前的比较研究。我们在严格的实验设置中研究了多种单类分类和无监督异常值检测方法,使用不同的性能度量在大量具有不同特征的数据集上对它们进行比较。与先前的比较研究不同,在先前的研究中模型(算法、参数)是通过使用来自两类(异常值和内点)的示例来选择的,这里我们还研究并比较了在没有来自异常值类示例的情况下进行模型选择的不同方法,这对于实际应用来说更现实,因为带标签的异常值很少可得。我们的结果表明,总体而言,无论是否使用真实情况进行参数选择,支持向量数据描述(SVDD)和高斯混合模型(GMM)都是表现最佳的。然而,在特定的应用场景中,其他方法表现出更好的性能。只要正确选择集成成员,将单类分类器组合成集成在准确性方面比单个方法表现更好。

补充信息

在线版本包含可在10.1007/s10618 - 023 - 00931 - x获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4797/10326160/50cbb7f1518c/10618_2023_931_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验