Vafaei Sadr Alireza, Bassett Bruce A, Kunz M
Département de Physique Théorique and Center for Astroparticle Physics, University of Geneva, Geneva, Switzerland.
Institute for Research in Fundamental Sciences (IPM), P. O. Box 19395-5531, Tehran, Iran.
Neural Comput Appl. 2023;35(2):1157-1167. doi: 10.1007/s00521-021-05839-5. Epub 2021 Mar 11.
Anomaly detection is challenging, especially for large datasets in high dimensions. Here, we explore a general anomaly detection framework based on dimensionality reduction and unsupervised clustering. DRAMA is released as a general python package that implements the general framework with a wide range of built-in options. This approach identifies the primary prototypes in the data with anomalies detected by their large distances from the prototypes, either in the latent space or in the original, high-dimensional space. DRAMA is tested on a wide variety of simulated and real datasets, in up to 3000 dimensions, and is found to be robust and highly competitive with commonly used anomaly detection algorithms, especially in high dimensions. The flexibility of the DRAMA framework allows for significant optimization once some examples of anomalies are available, making it ideal for online anomaly detection, active learning, and highly unbalanced datasets. Besides, DRAMA naturally provides clustering of outliers for subsequent analysis.
异常检测具有挑战性,尤其是对于高维的大型数据集。在此,我们探索一种基于降维和无监督聚类的通用异常检测框架。DRAMA作为一个通用的Python包发布,它通过广泛的内置选项实现了该通用框架。这种方法通过在潜在空间或原始高维空间中与原型的远距离来检测数据中的主要原型以及异常。DRAMA在多达3000维的各种模拟和真实数据集上进行了测试,结果表明它具有鲁棒性,并且与常用的异常检测算法相比具有很强的竞争力,尤其是在高维情况下。一旦有一些异常示例,DRAMA框架的灵活性允许进行显著的优化,这使其非常适合在线异常检测、主动学习以及高度不平衡的数据集。此外,DRAMA自然地提供了离群值聚类以供后续分析。