Suppr超能文献

多元数据无监督异常检测算法的比较评估

A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data.

作者信息

Goldstein Markus, Uchida Seiichi

机构信息

Center for Co-Evolutional Social System Innovation, Kyushu University, Fukuoka, Japan.

Department of Advanced Information Technology, Kyushu University, Fukuoka, Japan.

出版信息

PLoS One. 2016 Apr 19;11(4):e0152173. doi: 10.1371/journal.pone.0152173. eCollection 2016.

Abstract

Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.

摘要

异常检测是识别数据集中与正常情况不同的意外项目或事件的过程。与标准分类任务不同,异常检测通常应用于未标记的数据,仅考虑数据集的内部结构。这一挑战被称为无监督异常检测,并且在许多实际应用中都有涉及,例如网络入侵检测、欺诈检测以及生命科学和医学领域。该领域已经提出了数十种算法,但遗憾的是,研究界仍然缺乏比较通用的评估方法以及常见的公开可用数据集。本研究解决了这些缺点,在来自多个应用领域的10个不同数据集上对19种不同的无监督异常检测算法进行了评估。通过发布源代码和数据集,本文旨在为无监督异常检测研究提供一个新的、资金充足的基础。此外,该评估首次揭示了不同方法的优缺点。除了异常检测性能外,还概述了计算工作量、参数设置的影响以及全局/局部异常检测行为。作为结论,我们针对典型的实际任务给出了算法选择建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c6e/4836738/2fa00807f2ff/pone.0152173.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验